Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

Slow insert performance #4

Closed ShaunCurrier closed 9 years ago

ShaunCurrier commented 9 years ago

Using the following code I am getting very poor insert performance. cursor.executemany('INSERT INTO table VALUES (?,?,?,?)', [<bigint>, <byteint>, <byteint>, <float>]) I am only able to insert ~10-60 rows/sec. Using alternate python database module pyodbc, I achieve similar results, but with ceODBC I am able to get ~650 rows/sec. This occurs even when the table is basically empty. When my inserting server is fully bottlenecked by this and completely waiting for multiple 1k row inserts to finish, Teradata Viewpoint shows that the system barely has any activity, suggesting that the bottleneck is not the database itself, but instead the interface to the database. The table is a multiset table.

ShaunCurrier commented 9 years ago

Realized I should have clarified, my second argument is a list of lists of query parameters, like this: cursor.executemany('INSERT INTO table VALUES (?,?,?,?)', [[<bigint>, <byteint>, <byteint>, <float>], [<bigint>, <byteint>, <byteint>, <float>], ...)

ShaunCurrier commented 9 years ago

Darn, close issue button got me again...

escheie commented 9 years ago

Are you using "batch=True" as a parameter to your executemany call?

ShaunCurrier commented 9 years ago

Oh man... you are right, I didn't. I even remember reading about that during my first read-through of the module announcement. Trying again, I am now getting about 200 rows/sec using my rough calculation (my transaction actually includes locking the tables, doing some selects/small inserts, then doing the bigger 1k row insert. Not sure how much performance I am supposed to expect but I knew the older performance wasn't it.

escheie commented 9 years ago

I ran a simple test using your example and was able to insert 10,000 rows in less than a second.

with udaExec.connect(self.dsn, username=self.username, password=self.password) as session:          
            rowCount = 10000
            session.execute("CREATE TABLE testExecuteManyBatch (a BIGINT, b BYTEINT, c BYTEINT, d FLOAT)")
            session.executemany("INSERT INTO testExecuteManyBatch VALUES (?, ?, ?, ?)", [(x, x % 128, x % 128, x) for x in range(0, rowCount)], batch=True)

Duration: 0.981 seconds

I don't know that there is any significant performance gains to be had with additional tweaks to the python module. If this is acceptable, please close the bug.

Thanks, -Eric

ShaunCurrier commented 9 years ago

Agree, thanks for your time.

ShaunCurrier commented 9 years ago

Yep... inserts are definitely doing well now, I just hit 8k rows/sec. Thanks again.