Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Tried both ceODBC and mxODBC and both were also painfully slow. Ended up going
with an adodb connection with help from http://www.ecp.cc/pyado.html
comConn = win32com.client.Dispatch(r'ADODB.Connection')
DSN = 'PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=%s%s' %(dbDIR,dbOut)
comConn.Open(DSN)
rs = win32com.client.Dispatch(r'ADODB.Recordset')
rs.Open('[' + tblName +']', comConn, 1, 3)
for f in values:
rs.AddNew(LVI_LST, f)
Original comment by jaycgree...@gmail.com
on 5 Apr 2012 at 9:02
The behavior I encountered last time was as described by ghoerz in the
aforementioned stackoverflow link, i.e.
for each set of parameters
prepare and execute the insert statement
unprepare the statement
commit
However, now that I try again, I couldn't reproduce that behavior anymore with
any version of pyodbc from 2.1.8 to 3.0.6. Instead, it is now:
for each set of parameters
prepare and execute the insert statement
unprepare the statement
commit
This gives good enough performance.
Does anyone have the environment that can reproduce the slow behavior described
by ghoerz? If not, I think we can close the issue. I am sorry that I wasn't
diligent enough to note it down when reporting the issue.
Original comment by sok...@gmail.com
on 31 Dec 2012 at 6:07
Verified resolved with pyodbc-2.1.9.win32-py2.7
Original comment by bryan.ea...@gmail.com
on 2 Jan 2013 at 8:44
Attachments:
I'm still finding a very large difference in the performance of the
'executemany' command between the pyodbc module and the ceODBC module. In a
nutshell, ceODBC is 35 times faster doing bulk inserts over pyodbc.
In my case, I'm inserting 250,000 records into a SQL Server 2008 R2 database
(on its own server), using Python 2.7 on a Windows 7 (64-bit) PC, using the
'{SQL Server Native Client 10.0}' driver. I'm batching the records up into
sets of 10,000 records before running 'executemany'. This takes 12.5 minutes
with pyodbc, but just 21 seconds with ceODBC. A huge difference.
For reference, with this test, pyodbc is about 30% slower than the pymssql
module.
Can you think of a reason why the pyodbc module should be so slow? Is there
something I should be doing in my Python code?
Original comment by erski...@dnc.org
on 5 Aug 2013 at 1:36
I don't think this issue is resolved.
I'm using the most recent pyodbc-3.0.7-py2.7 and have found the same
phenomenon. The same python code for pyodbc takes minutes, but for ceODBC takes
only seconds. Version is ceODBC-2.0.1-py2.7. Critical code is
conn = ceODBC.connect(conninfo, autocommit = True)
cur = conn.cursor()
cur.executemany(UPSERT_SQL, DATA)
conn.close()
The executemany implementation of ceODBC must be many factors more efficient.
If someone tells me how to do the profiling, I'm happy to help. The database I
use is SAP HANA and for convenience, I use a 32-bit ODBC driver.
--Richard
Original comment by rlindne...@gmail.com
on 29 Oct 2013 at 12:33
[deleted comment]
I'm also having trouble with pyODBC performance.
I tested 1000 small rows into Netezza.
ceODBC executemany: 0.3 seconds
pyodbc executemany: 180-195 seconds
pyodbc execute: 78-130 seconds
Python 2.7 32-bit, pyodbc 3.0.7, ceodbc 2.0.1, Windows 7 64-bit, Netezza
Release 7.0 (P-1) [Build 26407]
Original comment by ahz...@gmail.com
on 18 Sep 2014 at 4:19
If need to update 300.000 columns it never finishes. This bug really needs to
be fixed.
Original comment by Atoc....@gmail.com
on 30 Nov 2014 at 5:07
AFAICS, Cursor_executemany is binding a single parameter and calling execute
(which calls SQLExecute).
In ceODBC, it binds all parameters and then calls SQLExecute.
Am I reading this correctly?
Original comment by javier.r...@gmail.com
on 15 Jan 2015 at 1:46
I tested by running an insert statement for multiple rows of data using
executemany. In the SQL Server Profiler of the run, it showed every row being
inserted using the sp_prepexec stored procedure. This is causing implicit
unprepares and explicit prepares of the query over and over again. What I
believe it should do is call sp_prepexec only for the first row (which prepares
the query and executes the first row), and then call sp_execute for all
subsequent rows in the batch (which will use the existing prepared query).
At the end of the batch, executemany does call an explicit sp_unprepare, which
is fine, but I believe these implicit unprepares and explicit prepares are what
is killing performance. (You can see these in Profiler if you turn on the
"Prepare SQL" and "Unprepare SQL" events in the "TSQL" events heading.)
Original comment by rich...@gmail.com
on 10 Feb 2015 at 6:40
Attachments:
It looks to me like one major problem is the function PrepareAndBind() (and how
it gets used), which should be two separate functions in order to be able to
prepare once and bind many times. This might also address the comment that says
"REVIEW: Why don't we always prepare?" which is in a code section that would
seem to have a fair amount of room for consolidation and simplification. so to
fix:
- separate PrepareAndBind into 2 func
- consolidate the code that calls prepare() and bind() functions, and don't
have separate cases for withparams or without params until you get to the bind
call
- only call prepare once
- maybe consolidate the code for executemany() vs execute once etc
or maybe just ditch pyodbc and get ceodbc instead...
Original comment by mewa...@gmail.com
on 27 Mar 2015 at 8:06
Looks like there is also a malloc and free call for every Bind, where it may be
better (and faster and cleaner) to have auto mem in the func that calling the
initial prep, re-using that mem for each bind (no explicit free required at
end).
Original comment by mewa...@gmail.com
on 27 Mar 2015 at 8:37
The problem is that pyodbc does not use "parameterized arrays" (whereas ceodbc
does). For a description of parameterized arrays, see the section "Using
Arrays of Parameters" on this webpage:
https://www.progress.com/products/datadirect-connect/odbc-drivers/odbc-developer
-center/odbc-tutorials/designing-performance-optimized-odbc-applications/select-
functions-that-optimize-performance
So, for example, if you call pyodbc's executemany() function with an INSERT
statement and a list of 10,000 records (each record consisting of another list
of data fields), pyodbc will make 10,000 calls to the database where each call
includes just one record. This is very slow. On the other hand, ceodbc will
make just one call to the database with all 10,000 records in the payload.
This is much, much quicker.
Having said that, not all databases accept parameterized arrays. Microsoft SQL
Server does:
https://msdn.microsoft.com/en-us/library/ms713841(v=vs.85).aspx
I don't know about other databases.
I suspect that the demand for parameterized arrays will increase now that
sqlalchemy has added bulk INSERT operations to version 1.0:
http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operat
ions
I'm sure there will soon be a lot of people out there using sqlalchemy
wondering why the bulk operations are no quicker than non-bulk operations
(especially with SQL Server).
Hopefully, Michael Kleehammer (or somebody else) will have some time to make
this improvement. It's really needed.
Original comment by toastie...@gmail.com
on 28 Mar 2015 at 3:45
Original issue reported on code.google.com by
sok...@gmail.com
on 29 Mar 2012 at 11:32