Closed lizdeika closed 2 years ago
Hi @lizdeika thanks so much for opening this issue. That is so strange that dbt build
takes significantly longer than typical model runs?
We have seen another performance related issue on our netsuite package within Issue #32; however, the customer was able to see the issue resolve itself and needed to increase their Snowflake warehouse size. Because of that, I don't think the two are related.
You do bring up a good point about the tests being the reason these are running longer than expected. It is especially interesting that other package builds run in an expected time frame. I will test this on our internal dataset and see if the same build issue persists.
One question I would ask is if you run dbt test --select tag:netsuite
does it run as long as it is within build
?
Hi,
dbt test --select tag:netsuite
finishes in under 1minute, several attempts
🤔
tag:netsuite
finishes in under 1minute, several attempts
This is truly interesting. I tested this on our internal dataset (using BigQuery) and did not see too much of a difference between build
and run
. I will continue to dig into this in the meantime to see if anything catches my eye, but I think it also may be relevant to create an issue on dbt-core to see if other users are experiencing build
issues similar to what you are noticing.
Especially since dbt test --select tag:netsuite
runs in under 1 minute.
Yesterday dbt run --select tag:netsuite
started to run for hours but randomly
Normaly it finishes in 2-3 minutes
But some runs started to run for 4 hours
19:03:55 | 36 of 37 OK created table model sbl_netsuite.netsuite__income_statement [SELECT 0 in 2796.03s]
22:21:18 | 35 of 37 OK created table model sbl_netsuite.netsuite__balance_sheet. [SELECT 0 in 14638.21s]
22:21:19 |
22:21:19 | Finished running 17 view models, 20 table models in 14671.09s.
Completed successfully
Done. PASS=37 WARN=0 ERROR=0 SKIP=0 TOTAL=37
And it is ONLY happening with fivetran/netsuite
package
We also use
fivetran/facebook_ads
fivetran/google_ads
fivetran/klaviyo
fivetran/microsoft_ads
fivetran/ad_reporting
Ads source tables have waaay more rows than netsuite source table But those packages never get stuck for hours
That is why I have a doubt about reporting to dbt-core
@lizdeika thanks for this additional context. I am going to loop up internally with our query optimization experts to investigate what may be happening with dbt_netsuite and why the build
times would be taking as long as you are seeing.
I will post back here once I have more details!
Hey @lizdeika I have been doing some digging into this and have a few other questions I would like to ask:
dbt build
is executed these will take a long time?build
?Hi,
version change could be the case, I will try to revert to 0.4.0(first version we started using) for a test. but not holding my fingers crossed here because at around the same time our netsuite went production and tables began to fillup, before that they were mostly empty
Did not help, reverted to 0.4.0 gets stuck on netsuite_balance_sheet
and/or netsuite_income_statement
just like with 0.4.1
Thanks for this additional information @lizdeika and sorry we have not narrowed this performance issue down yet...
I would like to try and pin point exactly where this issue may be occurring. Would you be able to materialize the intermediate models as tables and see if there is a specific model that is causing the performance issue for you? This would help me dig deeper into any query optimizations. You can use the below config to materialize as tables:
models:
netsuite_source:
+tags:
- netsuite
netsuite:
+tags:
- netsuite
intermediate:
+materialized: table
BINGO! 🚀
intermediate:
+materialized: table
did the trick
dbt build --select tag:netsuite
now takes ~3 minutes
dbt run --select tag:netsuite
~2 minutes
Woohoo that's great news!
That proves my theory that one of the intermediate (materialized as ephemeral
by default) has too complex of logic for higher data volume tables. Would you be comfortable sharing the row count of your netsuite transactions
and transaction_lines
tables? I believe the intermediate query logic on these tables may be causing the original long runtimes you were seeing.
If you have significantly higher row count than we do internally, I will open a Feature Request on this repo for our team to optimize the intermediate queries to improve the performance all around.
Table | Row count |
---|---|
transactions |
~30k |
transaction_lines |
0 |
Thanks @lizdeika, strangely enough our internal counts are comparable to yours. I think the best option forward for anyone else that runs into this same issue would be to materialize the intermediate models as table
to help improve the query performance.
Since we were able to achieve a more reasonable dbt build
time I feel it is appropriate to close this issue. Please feel free to comment back if you encounter this issue again. Thanks 😄
This could have a mention in README.md
Are you a current Fivetran customer? Deividas, smartbrandslaboratory
Describe the bug package models are taged like this in
dbt_project.yml
dbt Cloud has a job with a command like this:
dbt build --select tag:netsuite
Jobs mostly do not ever finish even after 13hours... We cancel them.I was able to see success in dbt Cloud dev environment and it took 28minutes.
netsuite__balance_sheet
took 8minutesnetsuite__income_statement
took 27minutesNetsuite tables are very small, maxing out at fivetran_audit 138k rows, system_notes 26k rows, transactions 5k, customers 2k
The interesting part: Changing command to
dbt run --select tag:netsuite
. run instead of build the job finishes in under 3minutesRelated We are using this approach for other fivetran dbt packages: fivetran/facebook_ads, fivetran/google_ads, fivetran/klaviyo, fivetran/microsoft_ads, fivetran/ad_reporting etc Jobs are setup with
dbt build
, tables have waaaaaay more rows and they build just fine.Expected behavior
Build takes just a bit longer(because of tests), not magnitudes longer
Project variables configuration
no other package specific configuration options
Package Version
Warehouse
- [ ] BigQuery - [ ] Redshift - [ ] Snowflake - [x] Postgres - [ ] Databricks - [ ] Other (provide details below) **Additional context** **Screenshots**Please indicate the level of urgency
Are you interested in contributing to this package?