Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

logger.debug() calls hampering executemany() performance #37

Closed ShaunCurrier closed 8 years ago

ShaunCurrier commented 8 years ago

I have an application that loads a few thousand rows into Teradata at a time with executemany() calls. In profiling my application, I found that the logger.debug() calls in OdbcCusrsor._executeManyBatch() were taking an unreasonably high amount of time. This is with configureLogging set to False and installing a filter on both my console and file log handlers. The filter prevents teradata udaexec messages from passing. So, I am not persisting these messages anywhere. But apparently they are still invoking enough of the logging machinery to affect performance. It seems the logger.debug() function was called on average about 732 times per second due to the number of parameter sequences given to executemany().

By commenting out the three logger.debug() calls in OdbcCusrsor._executeManyBatch(), I was able to reduce my transaction times by more than 50%.

I am not sure if this issue is one that should be improved in PyTd or if there are other avenues to prevent the performance hit. Possible solutions:

  1. Should the higher level application somehow instruct the logging module to drop these messages immediately so that less of the logging machinery is invoked? Some quick research did not yield fruit in this area, but I could have missed something.
  2. Should the logger.debug() calls be gated by either a check of __debug__ or configureLogging? Or perhaps a new option similar to configureLogging?
escheie commented 8 years ago

I added a single check (debugEnabled = logger.isEnabledFor(logging.DEBUG)) at the top of the method, then check this local variable within the loop before calling logger.debug().