blue-yonder / turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
http://turbodbc.readthedocs.io/en/latest
MIT License
623 stars 85 forks source link

Odd cursor rowcount values being returned #40

Closed keitherskine closed 7 years ago

keitherskine commented 7 years ago

I'm getting strange values for the cursor rowcount value when running certain queries.

Here's my setup: Python 3.4.4, running on CentOS 6.6 MS SQL Server 2008 R2 Using the Microsoft ODBC Driver 11 for SQL Server on Linux with unixODBC 2.3.2

Here is an interactive session:

>>> crsr.execute("insert into dbo.turbodbc (col_int, col_str_a, col_str_u) values (1, 'a', 'b'), (2, 'c', 'd')")  # this succeeds
>>> crsr.rowcount  # expecting the answer 2
0                  # !!!!!!
>>> conn.commit()
>>> crsr.rowcount  # still expecting the answer 2
0                  # !!!!!!
>>>
>>> crsr.execute("insert into dbo.turbodbc (col_int, col_str_a, col_str_u) values (?, ?, ?)", (3, 'a', 'b'))  # this succeeds
>>> crsr.rowcount  # expecting the answer 1
1                  # yes!
>>> conn.commit()
>>> crsr.rowcount  # expecting the answer 1
1                  # yes!
>>>
>>> crsr.executemany("insert into dbo.turbodbc (col_int, col_str_a, col_str_u) values (?, ?, ?)", ((4, 'a', 'b'), (5, 'c', 'd')))  # this succeeds
>>> crsr.rowcount  # expecting the answer 2
2                  # yes!
>>> conn.commit()
>>> crsr.rowcount  # expecting the answer 2
2                  # yes!
>>>
>>> crsr.executemany("insert into dbo.does_not_exist (col_int, col_str_a, col_str_u) values (?, ?, ?)", ((4, 'a', 'b'), (5, 'c', 'd')))  # this fails
Traceback (most recent call last):
  File "/home/erskinek/venvs/turbodbc/lib/python3.4/site-packages/turbodbc/exceptions.py", line 31, in wrapper
    return f(*args, **kwds)
  File "/home/erskinek/venvs/turbodbc/lib/python3.4/site-packages/turbodbc/cursor.py", line 84, in executemany
    self.impl.prepare(sql)
turbodbc_intern.Error: ODBC error
state: 42S02
native error code: 208
message: [Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Invalid object name 'dbo.does_not_exist'.

...

>>> crsr.rowcount  # expecting -1 because the previous executemany() call failed
2                  # !!!!!!
>>>
>>> crsr.execute("select * from dbo.turbodbc")  # this succeeds
>>> crsr.rowcount  # hoping for a positive integer (but the total number of rows might not yet be available)
-1                 # !!!!!!
>>> rows = crsr.fetchall()  # this succeeds
>>> crsr.rowcount  # definitely expecting a positive integer now
-1                 # !!!!!!
>>> conn.commit()
>>> crsr.rowcount  # still expecting a positive integer
-1                 # !!!!!!
>>>

I appear to be getting the expected rowcount value when using parameters, but not otherwise. Also, the rowcount value does not appear to be reset to some default value (i.e. -1) before the execute...() calls are made. This causes the old rowcount value to be retained, I believe incorrectly.

MathMagique commented 7 years ago

Not all databases correctly return rowcount information, sadly. A few of the results you mentioned seem dubious and may be caused by turbodbc. Thanks for the detailed report!

keitherskine commented 7 years ago

I'm sorry Michael, you're completely right about rowcount on SELECT statements, MS SQL doesn't provide one (see SQLRowCount) so those last "errors" are not errors at all.

MathMagique commented 7 years ago

No worries, Keith. Still, rowcount not being reset after exceptions could use some fixing.

Btw: As I mentioned elsewhere, I don't have access to MS SQL server. Leaving the issues you have reported aside, can you give some short feedback on turbodbc's performance with this database, for example compared to pyodbc?

keitherskine commented 7 years ago

Hi Michael, I ran some performance tests. I'm not sure that comparing executemany on turbodbc against executemany on pyodbc is much of a comparison because executemany on pyodbc is literally just a "for" loop around the execute function. Instead, I thought it would be more interesting to compare bulk loading a table using executemany on turbodbc against bulk loading the same table using the MS SQL Server bulk load utility bcp.

I set up a test table with 20 string columns. For each test, I loaded a million records into the (empty) table with a 36 character ascii string into each column. Here are the results:

bcp: 12 seconds to write a file with the data (using Python), 31 seconds to bulk load the data to the table, a total of 43 seconds turbodbc: 109 seconds to INSERT a parameter set of the same million data records (default parameter_sets_to_buffer value, i.e. 5,000) turbodbc: 90 seconds to INSERT a parameter set of the same million data records (parameter_sets_to_buffer = 50,000) turbodbc: 86 seconds to INSERT a parameter set of the same million data records (parameter_sets_to_buffer = 250,000)

As you can see, the bulk loading tool bcp is at least twice as fast as turbodbc for inserting records into an empty table. However, as far as I'm concerned, that's better than expected, I'm kinda surprised it's even that close. After all, bcp is designed to do just one thing and do it as fast as possible. Nevertheless, the turbodbc executemany performance is roughly in the same ball-park as bcp, and turbodbc has the major convenience of being much easier to code in Python (bcp is a finicky beast that has to be run in a subprocess), and executemany can also be used for UPDATE statements, which bcp cannot. All in all, executemany is a very handy function indeed.

MathMagique commented 7 years ago

That's brilliant feedback, actually. String parameters are the easiest to handle for bulk imports because no additional parsing is required. Things could look even better when using integers and floating point numbers. Thanks for your efforts!

MathMagique commented 7 years ago

This issue contained two bugs: 1) rowcount was not reset after exceptions 2) rowcount was not set correctly in case parameters were baked into the SQL string

Both are fixed now.

keitherskine commented 7 years ago

Many thanks @MathMagique !