Open beefupinetree opened 3 weeks ago
Just giving you some validation that yes this is very slow. Some discussion for the reasons behind this can be found here. It won't be possible to improve the speed of the imports automatically without some changes to the connector. But if you're willing to write your own INSERT queries you can import with much greater efficiency.
I followed the instructions on this page to create a SQLAlchemy engine and used it with the Pandas
to_sql()
method. It's taking around 2 seconds to append one data point to a Delta table in Azure Databricks, and it seems to be scaling linearly. A dataframe containing 1 column and 10 rows is taking ~20 seconds to push to Azure.Is there a way to make writing back to Databricks from a local machine faster?
Example code:
Local specs: databricks-sql-connector==3.4.0 pandas==2.2.2 Python 3.10.14
Azure specs: Databricks SQL warehouse cluster Runtime==13.3 LTS