Is your feature request related to a problem? Please describe.
When comparing hundreds of tables in parallel there is a connection overhead of connecting and closing the connection per table.
Describe the solution you'd like
It would be great to have some way to use connection pooling for comparing many tables.
One option is to allow the connect() method accepting mysql_pool.get_connection() object (in case of mysql for example) instead of passing the credentials.
We can take this one step further and think about a method similar to diff_tables() that can accept a list of TableSegments and manage the threads / subprocesses and the connection pooling internally.
In this case, the goal will be to saturate the database constantly with x number of concurrent connections, whether by many small tables (single thread) or few big tables (multi thread).
The max_threadpool_size can be dynamic per table, calculated from the table data length.
This will minimize the total time it takes to compare the tables.
Describe alternatives you've considered
If the user was responsible for creating the db connection instead of using the connect() method, one could implement a connection pooling to improve the performance and reduce potential connection errors.
But currently the connection must be made using the connect() method.
Is your feature request related to a problem? Please describe. When comparing hundreds of tables in parallel there is a connection overhead of connecting and closing the connection per table.
Describe the solution you'd like It would be great to have some way to use connection pooling for comparing many tables. One option is to allow the
connect()
method acceptingmysql_pool.get_connection()
object (in case of mysql for example) instead of passing the credentials.We can take this one step further and think about a method similar to
diff_tables()
that can accept a list ofTableSegment
s and manage the threads / subprocesses and the connection pooling internally.In this case, the goal will be to saturate the database constantly with x number of concurrent connections, whether by many small tables (single thread) or few big tables (multi thread). The
max_threadpool_size
can be dynamic per table, calculated from the table data length. This will minimize the total time it takes to compare the tables.Describe alternatives you've considered If the user was responsible for creating the db connection instead of using the connect() method, one could implement a connection pooling to improve the performance and reduce potential connection errors. But currently the connection must be made using the connect() method.