Closed keitherskine closed 7 years ago
Not all databases correctly return rowcount
information, sadly. A few of the results you mentioned seem dubious and may be caused by turbodbc. Thanks for the detailed report!
I'm sorry Michael, you're completely right about rowcount
on SELECT statements, MS SQL doesn't provide one (see SQLRowCount) so those last "errors" are not errors at all.
No worries, Keith. Still, rowcount
not being reset after exceptions could use some fixing.
Btw: As I mentioned elsewhere, I don't have access to MS SQL server. Leaving the issues you have reported aside, can you give some short feedback on turbodbc's performance with this database, for example compared to pyodbc?
Hi Michael, I ran some performance tests. I'm not sure that comparing executemany
on turbodbc against executemany
on pyodbc is much of a comparison because executemany
on pyodbc is literally just a "for" loop around the execute
function. Instead, I thought it would be more interesting to compare bulk loading a table using executemany
on turbodbc against bulk loading the same table using the MS SQL Server bulk load utility bcp.
I set up a test table with 20 string columns. For each test, I loaded a million records into the (empty) table with a 36 character ascii string into each column. Here are the results:
bcp: 12 seconds to write a file with the data (using Python), 31 seconds to bulk load the data to the table, a total of 43 seconds
turbodbc: 109 seconds to INSERT a parameter set of the same million data records (default parameter_sets_to_buffer
value, i.e. 5,000)
turbodbc: 90 seconds to INSERT a parameter set of the same million data records (parameter_sets_to_buffer
= 50,000)
turbodbc: 86 seconds to INSERT a parameter set of the same million data records (parameter_sets_to_buffer
= 250,000)
As you can see, the bulk loading tool bcp is at least twice as fast as turbodbc for inserting records into an empty table. However, as far as I'm concerned, that's better than expected, I'm kinda surprised it's even that close. After all, bcp is designed to do just one thing and do it as fast as possible. Nevertheless, the turbodbc executemany
performance is roughly in the same ball-park as bcp, and turbodbc has the major convenience of being much easier to code in Python (bcp is a finicky beast that has to be run in a subprocess), and executemany
can also be used for UPDATE statements, which bcp cannot. All in all, executemany
is a very handy function indeed.
That's brilliant feedback, actually. String parameters are the easiest to handle for bulk imports because no additional parsing is required. Things could look even better when using integers and floating point numbers. Thanks for your efforts!
This issue contained two bugs:
1) rowcount
was not reset after exceptions
2) rowcount
was not set correctly in case parameters were baked into the SQL string
Both are fixed now.
Many thanks @MathMagique !
I'm getting strange values for the cursor
rowcount
value when running certain queries.Here's my setup: Python 3.4.4, running on CentOS 6.6 MS SQL Server 2008 R2 Using the Microsoft ODBC Driver 11 for SQL Server on Linux with unixODBC 2.3.2
Here is an interactive session:
I appear to be getting the expected
rowcount
value when using parameters, but not otherwise. Also, therowcount
value does not appear to be reset to some default value (i.e. -1) before theexecute...()
calls are made. This causes the oldrowcount
value to be retained, I believe incorrectly.