fal-ai / dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
https://fal.ai/dbt-fal
Apache License 2.0
851 stars 71 forks source link

fix(dbt-fal): faster inspect stack by using manual solution without loading source code #723

Closed chamini2 closed 1 year ago

chamini2 commented 1 year ago

Description

We were getting reports of slow dbt seed times and even saw that it applied to some other dbt tasks too.

Thanks to @daveworth pointing out we added an inspect.stack() call in #691 that may be slow!

This analysis of the stack is necessary to decide if we should return fal or the DB adapter type in the adapter.type() call. This is to make dbt packages that make decisions based on adapter.type() work.

This approach reduces time greatly because inspect.stack() by default loads the file of each function in the stack to take some context to show (some lines).

The time difference I am seeing with the seed data from https://github.com/fal-ai/jaffle_shop_with_fal is:

Base comparison WITHOUT dbt-fal:

❯ dbt seed # dbt-postgres
23:11:41  Running with dbt=1.3.2
23:11:41  Unable to do partial parsing because config vars, config profile, or config target have changed
23:11:41  Found 11 models, 20 tests, 0 snapshots, 0 analyses, 289 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics
23:11:41
23:11:41  Concurrency: 1 threads (target='pg_dev')
23:11:41
23:11:41  1 of 4 START seed file dbt_fal.raw_customers ................................... [RUN]
23:11:42  1 of 4 OK loaded seed file dbt_fal.raw_customers ............................... [INSERT 100 in 0.09s]
23:11:42  2 of 4 START seed file dbt_fal.raw_order_attributes ............................ [RUN]
23:11:42  2 of 4 OK loaded seed file dbt_fal.raw_order_attributes ........................ [INSERT 99 in 0.10s]
23:11:42  3 of 4 START seed file dbt_fal.raw_orders ...................................... [RUN]
23:11:42  3 of 4 OK loaded seed file dbt_fal.raw_orders .................................. [INSERT 99 in 0.07s]
23:11:42  4 of 4 START seed file dbt_fal.raw_payments .................................... [RUN]
23:11:42  4 of 4 OK loaded seed file dbt_fal.raw_payments ................................ [INSERT 113 in 0.08s]
23:11:42
23:11:42  Finished running 4 seeds in 0 hours 0 minutes and 0.45 seconds (0.45s).
23:11:42
23:11:42  Completed successfully
23:11:42
23:11:42  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

Before:

❯ dbt seed # dbt-fal + dbt-postgres
23:10:22  Running with dbt=1.3.2
23:10:22  Found 11 models, 20 tests, 0 snapshots, 0 analyses, 296 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics
23:10:22
23:10:23  Concurrency: 1 threads (target='fal_dev')
23:10:23
23:10:23  1 of 4 START seed file dbt_fal.raw_customers ................................... [RUN]
23:10:26  1 of 4 OK loaded seed file dbt_fal.raw_customers ............................... [INSERT 100 in 3.00s]
23:10:26  2 of 4 START seed file dbt_fal.raw_order_attributes ............................ [RUN]
23:10:31  2 of 4 OK loaded seed file dbt_fal.raw_order_attributes ........................ [INSERT 99 in 5.63s]
23:10:31  3 of 4 START seed file dbt_fal.raw_orders ...................................... [RUN]
23:10:35  3 of 4 OK loaded seed file dbt_fal.raw_orders .................................. [INSERT 99 in 3.77s]
23:10:35  4 of 4 START seed file dbt_fal.raw_payments .................................... [RUN]
23:10:39  4 of 4 OK loaded seed file dbt_fal.raw_payments ................................ [INSERT 113 in 4.30s]
23:10:39
23:10:39  Finished running 4 seeds in 0 hours 0 minutes and 17.10 seconds (17.10s).
23:10:39
23:10:39  Completed successfully
23:10:39
23:10:39  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

After:

❯ dbt seed # dbt-fal + dbt-postgres
23:11:04  Running with dbt=1.3.2
23:11:04  Found 11 models, 20 tests, 0 snapshots, 0 analyses, 296 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics
23:11:04
23:11:04  Concurrency: 1 threads (target='fal_dev')
23:11:04
23:11:04  1 of 4 START seed file dbt_fal.raw_customers ................................... [RUN]
23:11:04  1 of 4 OK loaded seed file dbt_fal.raw_customers ............................... [INSERT 100 in 0.10s]
23:11:04  2 of 4 START seed file dbt_fal.raw_order_attributes ............................ [RUN]
23:11:05  2 of 4 OK loaded seed file dbt_fal.raw_order_attributes ........................ [INSERT 99 in 0.14s]
23:11:05  3 of 4 START seed file dbt_fal.raw_orders ...................................... [RUN]
23:11:05  3 of 4 OK loaded seed file dbt_fal.raw_orders .................................. [INSERT 99 in 0.10s]
23:11:05  4 of 4 START seed file dbt_fal.raw_payments .................................... [RUN]
23:11:05  4 of 4 OK loaded seed file dbt_fal.raw_payments ................................ [INSERT 113 in 0.12s]
23:11:05
23:11:05  Finished running 4 seeds in 0 hours 0 minutes and 0.80 seconds (0.80s).
23:11:05
23:11:05  Completed successfully
23:11:05
23:11:05  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

We should try to make this better, but feels like an acceptable result for now.

linear[bot] commented 1 year ago

FEA-769 dbt-fal: decrease in speed when using dbt-fal wrapping Postgres #720