fivetran / dbt_netsuite

Data models for Fivetran's Netsuite connector, built using dbt.
https://fivetran.github.io/dbt_netsuite/
Apache License 2.0
36 stars 35 forks source link

BUG - Extremely long "dbt build" times while "dbt run" is fast #37

Closed lizdeika closed 2 years ago

lizdeika commented 3 years ago

Are you a current Fivetran customer? Deividas, smartbrandslaboratory

Describe the bug package models are taged like this in dbt_project.yml

models:
  netsuite_source:
    +tags:
      - netsuite
  netsuite:
    +tags:
      - netsuite

dbt Cloud has a job with a command like this: dbt build --select tag:netsuite Jobs mostly do not ever finish even after 13hours... We cancel them.

I was able to see success in dbt Cloud dev environment and it took 28minutes. netsuite__balance_sheet took 8minutes netsuite__income_statement took 27minutes

Netsuite tables are very small, maxing out at fivetran_audit 138k rows, system_notes 26k rows, transactions 5k, customers 2k

The interesting part: Changing command to dbt run --select tag:netsuite. run instead of build the job finishes in under 3minutes

Related We are using this approach for other fivetran dbt packages: fivetran/facebook_ads, fivetran/google_ads, fivetran/klaviyo, fivetran/microsoft_ads, fivetran/ad_reporting etc Jobs are setup with dbt build, tables have waaaaaay more rows and they build just fine.

Expected behavior

Build takes just a bit longer(because of tests), not magnitudes longer

Project variables configuration

vars:
  netsuite_schema: our_custom_netsuite_schema

no other package specific configuration options

Package Version

  - package: fivetran/netsuite
    version: 0.4.1

Warehouse

- [ ] BigQuery - [ ] Redshift - [ ] Snowflake - [x] Postgres - [ ] Databricks - [ ] Other (provide details below) **Additional context** **Screenshots**

Please indicate the level of urgency

Are you interested in contributing to this package?

fivetran-joemarkiewicz commented 3 years ago

Hi @lizdeika thanks so much for opening this issue. That is so strange that dbt build takes significantly longer than typical model runs?

We have seen another performance related issue on our netsuite package within Issue #32; however, the customer was able to see the issue resolve itself and needed to increase their Snowflake warehouse size. Because of that, I don't think the two are related.

You do bring up a good point about the tests being the reason these are running longer than expected. It is especially interesting that other package builds run in an expected time frame. I will test this on our internal dataset and see if the same build issue persists.

One question I would ask is if you run dbt test --select tag:netsuite does it run as long as it is within build?

lizdeika commented 3 years ago

Hi, dbt test --select tag:netsuite finishes in under 1minute, several attempts

fivetran-joemarkiewicz commented 3 years ago

🤔

tag:netsuite finishes in under 1minute, several attempts

This is truly interesting. I tested this on our internal dataset (using BigQuery) and did not see too much of a difference between build and run. I will continue to dig into this in the meantime to see if anything catches my eye, but I think it also may be relevant to create an issue on dbt-core to see if other users are experiencing build issues similar to what you are noticing.

Especially since dbt test --select tag:netsuite runs in under 1 minute.

lizdeika commented 2 years ago

Yesterday dbt run --select tag:netsuite started to run for hours but randomly Normaly it finishes in 2-3 minutes But some runs started to run for 4 hours

19:03:55 | 36 of 37 OK created table model sbl_netsuite.netsuite__income_statement [SELECT 0 in 2796.03s]
22:21:18 | 35 of 37 OK created table model sbl_netsuite.netsuite__balance_sheet. [SELECT 0 in 14638.21s]
22:21:19 |
22:21:19 | Finished running 17 view models, 20 table models in 14671.09s.

Completed successfully

Done. PASS=37 WARN=0 ERROR=0 SKIP=0 TOTAL=37

And it is ONLY happening with fivetran/netsuite package

We also use fivetran/facebook_ads fivetran/google_ads fivetran/klaviyo fivetran/microsoft_ads fivetran/ad_reporting

Ads source tables have waaay more rows than netsuite source table But those packages never get stuck for hours

That is why I have a doubt about reporting to dbt-core

fivetran-joemarkiewicz commented 2 years ago

@lizdeika thanks for this additional context. I am going to loop up internally with our query optimization experts to investigate what may be happening with dbt_netsuite and why the build times would be taking as long as you are seeing.

I will post back here once I have more details!

fivetran-joemarkiewicz commented 2 years ago

Hey @lizdeika I have been doing some digging into this and have a few other questions I would like to ask:

lizdeika commented 2 years ago

Hi,

lizdeika commented 2 years ago

version change could be the case, I will try to revert to 0.4.0(first version we started using) for a test. but not holding my fingers crossed here because at around the same time our netsuite went production and tables began to fillup, before that they were mostly empty

Did not help, reverted to 0.4.0 gets stuck on netsuite_balance_sheet and/or netsuite_income_statement just like with 0.4.1

fivetran-joemarkiewicz commented 2 years ago

Thanks for this additional information @lizdeika and sorry we have not narrowed this performance issue down yet...

I would like to try and pin point exactly where this issue may be occurring. Would you be able to materialize the intermediate models as tables and see if there is a specific model that is causing the performance issue for you? This would help me dig deeper into any query optimizations. You can use the below config to materialize as tables:

models:
  netsuite_source:
    +tags:
      - netsuite
  netsuite:
    +tags:
      - netsuite
    intermediate:
      +materialized: table
lizdeika commented 2 years ago

BINGO! 🚀

    intermediate:
      +materialized: table

did the trick

dbt build --select tag:netsuite now takes ~3 minutes dbt run --select tag:netsuite ~2 minutes

fivetran-joemarkiewicz commented 2 years ago

Woohoo that's great news!

That proves my theory that one of the intermediate (materialized as ephemeral by default) has too complex of logic for higher data volume tables. Would you be comfortable sharing the row count of your netsuite transactions and transaction_lines tables? I believe the intermediate query logic on these tables may be causing the original long runtimes you were seeing.

If you have significantly higher row count than we do internally, I will open a Feature Request on this repo for our team to optimize the intermediate queries to improve the performance all around.

lizdeika commented 2 years ago
Table Row count
transactions ~30k
transaction_lines 0
fivetran-joemarkiewicz commented 2 years ago

Thanks @lizdeika, strangely enough our internal counts are comparable to yours. I think the best option forward for anyone else that runs into this same issue would be to materialize the intermediate models as table to help improve the query performance.

Since we were able to achieve a more reasonable dbt build time I feel it is appropriate to close this issue. Please feel free to comment back if you encounter this issue again. Thanks 😄

lizdeika commented 2 years ago

This could have a mention in README.md