Bobspadger / pyodbc

Automatically exported from code.google.com/p/pyodbc
MIT No Attribution
0 stars 0 forks source link

Regain performance lost since 2.0.36 #17

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
As reported by Joe Salmeri (via email), the 2.1.x is slower than the 2.0.x
branch.

After examining some ODBC trace logs, we determined that it was extra
SQLDescribeParam calls that were added to work around binding None/NULL.

When passing parameters, pyodbc has to tell the driver the data type of the
parameter.  When None is passed, pyodbc obviously doesn't know -- it is
NULL and there is no way to tell if it is going into a character column, an
integer, etc.

In the 2.0 branch, pyodbc always set the data type to char, but SQL Server
will not let you insert a character value into a binary column even if you
are passing NULL.  To fix this, I use SQLDescribeParam, if supported by the
driver, to determine the correct data type.  Unfortunately, this adds 1
driver call per parameter for every query, which is way too much overhead.

Original issue reported on code.google.com by mkleehammer on 29 Dec 2008 at 4:37

GoogleCodeExporter commented 9 years ago
The fix will be in 2 parts.  The first, which has been tested by Joe, is to:

* Only call SQLDescribeParam for a parameter when None is passed.  All other 
types
are determined from the pyodbc data type.

* Cache this information with the prepared statement.  If the same SQL 
statement is
used repeatedly, we will never call more than once per parameter.

The second part of the fix is to cache this information for more than just the 
last
statement.  Instead, I'll keep an internal dictionary of the last 10, 20, etc.
statements and anything we know about them.  I'll have to make a method to 
change the
dictionary size and to flush it.

Original comment by mkleehammer on 29 Dec 2008 at 4:40

GoogleCodeExporter commented 9 years ago
Are you actively working on this? I'm looking for a project to learn python on 
c-level and I've been playing with pyodbc a couple of days. I was thinking:

- create a hashtable with hashes for each statement, both for fast lookup and 
to prevent sql-code hanging around in-memory

- instead of caching N statements, to cache all statements, add a 'lastused' 
ticker and flush statements that were not re-used for x seconds, optionally 
settable by user (where x=0 means no cache)

I'd not be fast, thinking in term of months go get this right.

Original comment by rbrt8...@gmail.com on 28 Nov 2010 at 11:42

GoogleCodeExporter commented 9 years ago
How about adding a way to specify the types manually, for example, by exposing 
them as classes? Something like this: cursor.execute('select Helloworld = ?', 
pyodbc.types.Binary(None))

Original comment by sad.n...@gmail.com on 6 Feb 2011 at 4:17

GoogleCodeExporter commented 9 years ago
Another benefit of specifying types manually is that pyodbc could then work 
better with ODBC drivers which do not implement SQLDescribeParam, eg FreeTDS.

Original comment by lukedell...@gmail.com on 7 Feb 2011 at 4:01

GoogleCodeExporter commented 9 years ago
Yes, and in addition to that, there are databases which do not implement 
SQLDescribeParam as well, for example, Sybase.

Original comment by sad.n...@gmail.com on 13 Feb 2011 at 9:34