Pretty slow data transfer, tinygrams likely the root problem

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run a small test program that does just a cursor.fetchall() on a database 
that's a few hundred megabytes in size
2. Time it
3. It's slow

What is the expected output? What do you see instead?
The result is correct, it's just too slow for me to use for my current project. 
 I suspect there's a minor misdesign somewhere, because if I strace the process 
on Ubuntu 10.04, I can see a main loop consisting of many calls to poll() and 
recv().  The problem is probably that the recv()'s are too plentiful - some 
code at some layer of the communication should be transmitting a larger amount 
of data across the network and breaking it up later, to avoid all the network 
tinygrams I'm seeing.

What version of the product are you using? On what operating system?
I'm using pyodbc 2.1.7 on (Windows XP SP3 with ActivePython 2.6.5 and the "{SQL 
Server}" driver) and (Ubuntu 10.04 with Python 2.6.5 (the shipped version), 
unixodbc 2.2.11-21 (the shipped version) and FreeTDS 0.82-6build1 (the shipped 
version).  Both scenarios are underperforming, I believe.

Please provide any additional information below.
The packets transferred tend to be about 500 some bytes long - I suspect this 
is approximately the width of a row, plus TCP/IP overhead.

I don't know for a fact that this is a problem specific to pyodbc, but I can't 
help but wonder, since it seems to be slow on both OS's with quite different 
drivers.

Checking on XP with wireshark, it does look like I'm getting some full (1500 
byte) frames...  I'll probably try jumbo frames there to see if that helps, but 
the fact that the frames are tiny on Ubuntu remains a likely issue somewhere in 
the database communication I'm seeing.  And yes, it is a bit slower on Ubuntu 
than XP.

Thanks for the great software!  If this can't be sorted out, I'll have to go 
back to pywin32's "odbc" module - not pleasant ^_^

Original issue reported on code.google.com by strom...@gmail.com on 10 Sep 2010 at 11:54

GoogleCodeExporter commented 9 years ago

I'm building a test for this and I'd like to make sure I'm seeing something 
similar to your setup.  How large is each row (sounds like ~500 bytes?), how 
many rows did you fetch, and how long did it take?

Thanks.

Original comment by mkleehammer on 12 Sep 2010 at 10:21

GoogleCodeExporter commented 9 years ago

I forgot to mention, please test with 2.1.8 if possible.  There are some 
changes, one of which is very important if you are selecting more than 400 
characters/bytes from a single column.  Previously, anything over 400 or 500 
bytes, I forgot which, was considered a long value.  2.1.8 determines the 
maximum size when the first connection is made.

Original comment by mkleehammer on 12 Sep 2010 at 10:47

Changed state: Investigating

GoogleCodeExporter commented 9 years ago

This may be something we want to look at: http://support.microsoft.com/kb/157802

On a local server, I'm selecting 1M rows in 8.4 seconds, each about 500 bytes.

Original comment by mkleehammer on 12 Sep 2010 at 11:05

GoogleCodeExporter commented 9 years ago

I'll make sure I use 2.1.8 in my tests.

As far as the test code: I'm attempting to get permission to release (that part 
of) the code.

Original comment by strom...@gmail.com on 13 Sep 2010 at 2:49

GoogleCodeExporter commented 9 years ago


In a sense happily, I was already using 2.1.8 on Linux.  However, on Windows, I 
somehow had 2.1.7.

My rows are, on average, 1638.784 bytes wide (they use lots of varchar and 
nvarchar - nvarchar being unicode, and that figure is not including indexes).

tsql (the FreeTDS minimalist database client that only uses the TDS protocol) 
is also using just over 500 bytes per block for most of a similar data 
transfer.  I'm not sure that really means that this is a FreeTDS problem 
though, but I don't mean to rule that out either.

I've attached some test code to facilitate replication - with 
company-identifying details replaced with descriptions of what they should be 
set to in your environment.

Despite the tinygrams, the performance isn't terrible on Linux - on a 100BaseT 
link, I'm getting about 59.147 megabits/second from the attached test code, 
while using my pnetcat program 
(http://stromberg.dnsalias.org/~strombrg/pnetcat.html - it's in CPython, but 
doesn't use any database) I'm getting about 93.907 megabits/second, and iperf 
(which is in C++) is getting about 93.8 megabits/second.  But both pnetcat and 
iperf are relatively ivory-tower performance.

However, on Windows, the packets look larger (conceivably due to the Nagel 
algorithm, I don't know about that), but are moving slower (and usually the 
reverse will be true, as you likely know).  On Windows XP, I appear to be 
getting 8.227 megabits/second - and I'm afraid on this project, it's Windows 
that we need most.  Again, this is with 2.1.7.

Installing 2.1.8 on that same XP box, and rerunning my test program...  It now 
gets 8.212 megabits/second - so it may have gotten slightly slower with the 
pyodbc upgrade, but that's probably not a statistically significant difference.

BTW, I upgraded the device driver for the NIC in the XP box, but that seemed to 
help only marginally - probably not by a statistically significant amount.

Thanks for the great software!

PS: I'm using (k_in_table_according_to_sp_spaceused)*1024*8 / (10^6 * 
seconds_for_transfer) to compute megabits/second.

Original comment by strom...@gmail.com on 14 Sep 2010 at 7:35

Attachments:

replication.tar.bz2

GoogleCodeExporter commented 9 years ago

BTW, I'm not sure enabling jumbo frames is going to work out - the driver for 
the NIC in the XP box doesn't appear to allow them, despite the driver upgrade. 
 Also, a relevant IT guy here was pretty opposed to using them.

Original comment by strom...@gmail.com on 14 Sep 2010 at 9:46

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

An iperf from the ODBC client to the ODBC server gets 247 megabits/second

A pnetcat of 256M (being careful to avoid the cygwin python for speed) from 
same to same gets 356.310 megabits/second.

pnetcat's time is still only about a 3rd of the theoretical performance; jumbo 
frames would likely help these programs, but not _necessarily_ ODBC.  It is at 
least clearly gigabit speed though.

On the Windows XP client I invoked pnetcat with (that's cygwin dd, BTW):
dd if=/dev/zero bs=1024k count=256 | \
        c:/Python26/python.exe pnetcat.py -O host.net 5555 -i -w $((2**20)) -N 1 -v

On the Windows 7 server I invoked pnetcat with:
c:/Python26/python.exe -I 5555 -n -w 1048576 -N 1

Original comment by strom...@gmail.com on 15 Sep 2010 at 12:28

GoogleCodeExporter commented 9 years ago

Just to be totally clear: the tinygrams on Linux may not be that related to the 
underperformance on windows, because windows appears to be giving full length, 
standard ethernet frames.

However, if the tinygrams can be eliminated on Linux, that might make it even 
faster.

Original comment by strom...@gmail.com on 15 Sep 2010 at 12:37

GoogleCodeExporter commented 9 years ago


The "odbc" module that comes with "Python for Windows Extensions" 
(http://python.net/crew/mhammond/win32) is getting 112.026 megabits/second on 
the same server, database and table - that's a little over a factor of 13 
faster.

Original comment by strom...@gmail.com on 15 Sep 2010 at 1:05

GoogleCodeExporter commented 9 years ago

Another correction: time_with("import odbc" on XP) < time_with("import pyodbc" 
on Ubuntu) < time_with("import pyodbc" on XP)

That is, the first listed is fastest, the last listed is slowest.

Original comment by strom...@gmail.com on 15 Sep 2010 at 3:55

GoogleCodeExporter commented 9 years ago

The pyodbc vs odbc comparison is probably the best place to start.

Is it possible to produce a small trace of each for your test, perhaps fetching 
only two rows instead of all of them.  What I'm interested in are (1) what ODBC 
calls is pyodbc making that might be unnecessary and (2) what data types are 
there.

For the 2nd point, I might need to look at what the two modules are allocating 
and how odbc handles Unicode.  If it is Unicode related, I might have to add 
some #ifdefs specific to 32-bit and 64-bit Windows and optimize those cases to 
memcpy's.  (One weakness of C++ compilers is the inability to use sizeof() in 
compile time decisions without resorting to template-specialization, which I 
really don't think will be portable enough.)

Thanks.

Original comment by mkleehammer on 15 Sep 2010 at 2:09

GoogleCodeExporter commented 9 years ago


Attached to this message is a trace from two rows on Ubuntu 10.04.

Original comment by strom...@gmail.com on 15 Sep 2010 at 3:42

Attachments:

pyodbc.trace.bz2

GoogleCodeExporter commented 9 years ago


And here are two traces from Windows XP: One from pyodbc, one from odbc.

Original comment by strom...@gmail.com on 15 Sep 2010 at 4:01

Attachments:

GoogleCodeExporter commented 9 years ago


Please let me know if there's anything further you need to look into this.

Original comment by strom...@gmail.com on 15 Sep 2010 at 4:02

GoogleCodeExporter commented 9 years ago

OK, I've found a few things:

1) There were two loops calling SQLDescribeCol after a select, which is 
obviously unnecessary work.  I've eliminated that in a test version here.  I 
would expect this to be negligible, though you are selecting a lot of columns.

2) Related: If you don't need all of the columns, reducing them is your biggest 
performance win, even with the odbc module.

3) The odbc module is pre-allocating memory and binding into it, which is 
faster.  I just didn't expect it to e that much faster.  I'm going to create a 
performance branch and see what it would take to do something similar.  It 
might allocate a lot more memory, though.

Original comment by mkleehammer on 15 Sep 2010 at 10:02

GoogleCodeExporter commented 9 years ago


Great news!

Thanks.

Original comment by strom...@gmail.com on 15 Sep 2010 at 10:04

GoogleCodeExporter commented 9 years ago

BTW, we do need lots of columns - about 120 of them.

Original comment by strom...@gmail.com on 15 Sep 2010 at 11:33

GoogleCodeExporter commented 9 years ago

There are some interesting articles on the web about speeding up ODBC data 
transfers if you google for "ODBC bulk fetch" or "odbc firehose"....

HTH ^_^

Original comment by strom...@gmail.com on 16 Sep 2010 at 1:45

Bobspadger / pyodbc

Pretty slow data transfer, tinygrams likely the root problem #120