Closed kinghuang closed 1 year ago
Hey, @kinghuang.
Just released https://pypi.org/project/dbt-fal/1.4.2/, can you test this and confirm it looks good?
Just released https://pypi.org/project/dbt-fal/1.4.2/, can you test this and confirm it looks good?
Looks good! I did a pip update install to dbt-fal==1.4.2
and re-ran the same model. It completed without errors in 411 seconds
18:13:36 1 of 1 START python table model dbt_king_stg_dibi_env.foundations_wells_names_cleaned [RUN]
18:20:27 1 of 1 OK created python table model dbt_king_stg_dibi_env.foundations_wells_names_cleaned [OK in 410.56s]
Might be worth mentioning the change in the release notes.
Thank you! I will probably add this to our not-so-regular changelog blog.
Description
dbt-fal currently uses a generic pandas.DataFrame.to_sql call when using adapters others than
snowflake
,bigquery
, andduckdb
. This results in anINSERT
statement per row when writing dataframes to PostgreSQL, which is excruciatingly slow.The
pandas.DataFrame.to_sql
can take an optionalmethod
argument that controls the SQL insertion clause. In particular, the Pandas user guide provides an example for PostgreSQL that usesCOPY FROM
to efficiently insert rows.Here are some numbers from a Python model that I'm working on. The model reads and writes 4.1 million rows with two text columns. With dbt-fal as is, the model takes 1038 seconds total, of which about 660.9 seconds are spent writing data to PostgreSQL.
With the custom method, insertion time drops to just 35.2 seconds, over 18× faster.
The
dur
numbers come from temporary wall time calculations in the code.Integration tests
Adapter to test:
Python version to test: