fivetran / dbt_netsuite

Data models for Fivetran's Netsuite connector, built using dbt.
https://fivetran.github.io/dbt_netsuite/
Apache License 2.0
36 stars 34 forks source link

Feature/performance improvements #116

Closed fivetran-catfritz closed 4 months ago

fivetran-catfritz commented 4 months ago

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Changes 🚨

⚠️ Since the following changes are breaking, a --full-refresh after upgrading will be required.

  • Performance improvements:
  • Snowflake, Postgres, and Redshift:
  • Added an incremental strategy for the following models:
  • int_netsuite2__tran_with_converted_amounts
  • netsuite2__balance_sheet
  • netsuite2__income_statement
  • netsuite2_transaction_details
  • Bigquery and Databricks
  • Due to the variation in pricing and runtime priorities for customer, by default we chose table instead of incremental materialization for Bigquery and Databricks. For more information on this decision, see the Incremental Strategy section of the DECISIONLOG.
  • To enable incremental materialization for these destinations, see the Incremental Materialization section of the README for instructions.

Features

  • Added a default 3-day look-back to incremental models to accommodate late arriving records, based on the _fivetran_synced_date of transaction records. The number of days can be changed by setting the var lookback_window in your dbt_project.yml. See the Lookback Window section of the README for more details.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

Before marking this PR as "ready for review" the following have been applied:

Detailed Validation

Please share any and all of your validation steps:

jmongerlyra commented 4 months ago

@fivetran-catfritz I'm getting the errors below when running the new models. We are using Snowflake. Any suggestions?

dbt run --select +netsuite2__balance_sheet +netsuite2__income_statement +netsuite2__transaction_details

14:42:34    Database Error in model netsuite2__balance_sheet (models/netsuite2/netsuite2__balance_sheet.sql)
  002036 (42601): SQL compilation error:
  Subquery containing correlated aggregate function [MAX(INT_NETSUITE2__TRAN_WITH_CONVERTED_AMOUNTS._FIVETRAN_SYNCED_DATE)] can only appear in having or select clause
14:42:34
14:42:34    Database Error in model netsuite2__income_statement (models/netsuite2/netsuite2__income_statement.sql)
  002036 (42601): SQL compilation error:
  Subquery containing correlated aggregate function [MAX(INT_NETSUITE2__TRAN_WITH_CONVERTED_AMOUNTS._FIVETRAN_SYNCED_DATE)] can only appear in having or select clause
14:42:34
14:42:34    Database Error in model netsuite2__transaction_details (models/netsuite2/netsuite2__transaction_details.sql)
  002036 (42601): SQL compilation error:
  Subquery containing correlated aggregate function [MAX(INT_NETSUITE2__TRAN_WITH_CONVERTED_AMOUNTS._FIVETRAN_SYNCED_DATE)] can only appear in having or select clause
fivetran-catfritz commented 4 months ago

@jmongerlyra Hi Jared thanks for trying out the update and sharing the error! I think I found a workaround for your error and pushed it to my test branch. Does installing the update resolve your error?

jmongerlyra commented 4 months ago

@fivetran-catfritz See new error below. Let me know if you need anything else or if a call would be helpful.

cc: @rwang-lyra

17:42:43    Database Error in model netsuite2__balance_sheet (models/netsuite2/netsuite2__balance_sheet.sql)
  000904 (42000): SQL compilation error: error line 1 at position 20
  invalid identifier '_FIVETRAN_SYNCED_DATE'
17:42:43
17:42:43    Database Error in model netsuite2__transaction_details (models/netsuite2/netsuite2__transaction_details.sql)
  000904 (42000): SQL compilation error: error line 1 at position 20
  invalid identifier '_FIVETRAN_SYNCED_DATE'
17:42:43
17:42:43    Database Error in model netsuite2__income_statement (models/netsuite2/netsuite2__income_statement.sql)
  000904 (42000): SQL compilation error: error line 1 at position 20
  invalid identifier '_FIVETRAN_SYNCED_DATE'
rwang-lyra commented 4 months ago

_FIVETRAN_SYNCED_DATE

I wonder if this a new field not yet released by fivetran, hence the error

fivetran-catfritz commented 4 months ago

@jmongerlyra I have one more suggestion, and then if that doesn't work then yes a call might be beneficial! _FIVETRAN_SYNCED_DATE is a field I added to the source package, so you might need to do a full refresh on the the source models before running the transforms.

jmongerlyra commented 4 months ago

@fivetran-catfritz I'm using the command below with + in front of the models. I think that should build all dependencies, but I am not a dbt expert.

dbt run --select +netsuite2__balance_sheet +netsuite2__income_statement +netsuite2__transaction_details

I am pulling from here in packages.yml. I think it's in turn pulling the correct branch from dbt_netsuite_source. If a meeting would be helpful, I think you have my email. If not, I can send you something.

  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: feature/performance-improvements
    warn-unpinned: false
fivetran-catfritz commented 4 months ago

@jmongerlyra Ahh the first time you try these new changes, you'll need to throw a --full-refresh at the end to rebuild the upstreams completely. I'm not sure how Snowflake is handling all the materialization updates, but that usually is what is needed. It would look like: dbt run --select +netsuite2__balance_sheet +netsuite2__income_statement +netsuite2__transaction_details --full-refresh.

Then, for subsequent runs, you would want to run the models without the --full-refresh (how you were already running it). The subsequent runs will allow you to test the incremental strategy.

If that doesn't work, then yes let's have a meeting. You can schedule some time on my calendar via this link. Thanks!

jmongerlyra commented 4 months ago

@fivetran-catfritz I ran with --full-refresh and the data looks good. 👍 The balance sheet and income statement models still match our environment. We have the features below enabled.

    netsuite2__multibook_accounting_enabled: true
    netsuite2__using_to_subsidiary: true
fivetran-catfritz commented 4 months ago

@jmongerlyra Awesome. I have a few more questions for you:

  1. After your first successful run, were you able to do a 2nd run without the full refresh? I ask because the new incremental logic will only kick in when you don't use a full refresh.
  2. If so, would you say the models run faster on the 2nd/subsequent runs?
  3. From searching around, my understanding is that shorter runtimes means a lower Snowflake cost. Would you agree with this?

Thanks again for all you help testing this out!

jmongerlyra commented 4 months ago

Thanks for including us!

After your first successful run, were you able to do a 2nd run without the full refresh? I ask because the new incremental logic will only kick in when you don't use a full refresh.

Yes, I was able to perform a second run, removing the --full-refresh flag.

If so, would you say the models run faster on the 2nd/subsequent runs?

Much faster. The build normally takes 20 minutes. The incremental run completed in 3 minutes.

From searching around, my understanding is that shorter runtimes means a lower Snowflake cost. Would you agree with this?

Agree that this change should result in lower Snowflake costs depending on customer's individual agreements.

fivetran-catfritz commented 4 months ago

Thanks @jmongerlyra for your input, pleased to hear the changes were working as I hoped. We will be proceeding with releasing these updates, so I post back here when they are released!