Closed garfieldthecat closed 5 years ago
This is a great question, thanks for the elaborate details. Unfortunately, this issue is not related to sqlalchemy-turbodbc, which only helps sqlalchemy to create a connection with Turbodbc.
Fortunately for you I'm willing to help you, your problem seems to be here:
def data_to_turbodbc(df):
for d in df:
df[d].to_sql('Test ' + str(d) , conn2, if_exists='replace' )
You are looping over your dataframe and executing 1 SQL statement per row. This is really inefficient. Try to execute it in a single statement if possible, e.g.
def data_to_turbodbc(df):
df.to_sql('Test', conn2, if_exists='replace' )
If this does not help, this seems to be a good question to ask on StackOverflow. For performance issues with Turbodbc please consider opening an issue at https://github.com/blue-yonder/turbodbc/
I am closing this issue because it is not related to the sqlalchemy-turbodbc package.
Thank you for looking into this! However, I believe you have made some confusion. In my code, df
is a dictionary containing dataframes. df[d]
is a dataframe , so df[d].to_sql()
outputs a dataframe, not a single row, to sql. In fact, as far as I remember, in pandas to_sql
is a method of the DataFrame class only. So I believe my question is still relevant.
Do you happen to have any benchmarks of pyodbc vs turbodbc and maybe vs R or an ETL tool?
The fact that R is much much faster writing the same data to the same SQL server would suggest there is nothing inherently wrong in the SQL server itself - which, by the way, is running on the very same PC I am running these scripts on.
Thanks!
PS I have rerun the very same identical script. pyodbc and R take the same time, turbodbc now takes 12 seconds. It's an improvement over pyodbc but still way behind R. I wonder if anything can be done to speed it up even more?
This may be related to how Pandas to_sql
method is implemented. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
Try method='multi'
as a parameter to to_sql
I created 15 dataframes, each with 10,000 records, 30 columns of floats and 3 columns of text. Exporting all these dataframes to SQL takes:
Why is turbodbc slightly slower than pyodbc? I would have hoped a performance more similar to R's. Is there anything I can do to speed up the process? Thanks in advance!
I am using:
My code is: