fivetran / dbt_netsuite

Data models for Fivetran's Netsuite connector, built using dbt.
https://fivetran.github.io/dbt_netsuite/
Apache License 2.0
36 stars 34 forks source link

[Bug] Incremental rebuild missing master data updates #121

Closed jmongerlyra closed 2 months ago

jmongerlyra commented 3 months ago

Is there an existing issue for this?

Describe the issue

The new incremental build process for netsuite2__transaction_details appears to only include records where the transaction lines themselves have been updated. However, the rebuild misses updates where the transaction line itself is unchanged, but the related master data (account, subsidiary, etc.) was modified.

Relevant code:

    {% if is_incremental() %}
    where _fivetran_synced_date >= {{ max_fivetran_synced_date }}
    {% endif %}

Relevant error log or model output

No response

Expected behavior

Transaction lines incrementally update on changes to related master data. For example, if an account display name value changes, expected behavior is that all related transactional records would update with the new description on the next incremental build.

dbt Project configurations

netsuite2__multibook_accounting_enabled: true
netsuite2__using_to_subsidiary: true

Package versions

packages:
  - package: fivetran/netsuite_source
    version: [">=0.10.0", "<0.11.0"]

What database are you using dbt with?

snowflake

dbt Version

v1.7.3

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-avinash commented 3 months ago

Hi @jmongerlyra ! Thanks for bringing this to our attention. We will try to figure out the best approach here for figuring out how to capture these messing records early next week.

jmongerlyra commented 3 months ago

Thanks @fivetran-avinash. Just to clarify there are no missing records. The issue is values aren't being updated on the records.

fivetran-avinash commented 3 months ago

Hey @jmongerlyra , thanks for clarifying!

After some discussion with our team, I don't think we have a good solution in our data models for how to effectively bring in master data updates to the transaction details models. Our biggest concern is we're not certain as to why the master data on transaction details would get retroactively updated regularly and require additional incremental logic to build out.

The purpose of the new incremental logic was introduced to reduce compute time of the netsuite2__transaction_details model by bringing in only the latest transaction records that were subject to change. Adding incremental logic for additional master data updates would mean it'd have to scan the entire list of transaction records and update the transaction table, defeating the purpose of introducing the incremental logic.

I think your best bet is when you aren't seeing master data updates, is to periodically run dbt run --full-refresh when master data records are not getting updated properly. There are alternative options as well, but they are a bit more complex, but happy to provide them if the --full-refresh is not ideal for you. We can add documentation to our README to help provide more context as to what to do when master data is not matching.

We'd be curious to hear your thoughts here though! Particularly, we'd like to learn more about the cases master data would get updated on transaction details regularly.

jmongerlyra commented 3 months ago

We have custom fields on master data records that occasionally receive updates to the values. It's not often but it does happen. Can you provide the alternative options? We are running full refreshes now, but it takes around 30 minutes.

Could we take the max of any last modified value (transactional or master) associated with the record, and drive the incremental update from that?

fivetran-avinash commented 2 months ago

Hi @jmongerlyra , thanks for the thoughts here! Unfortunately, your suggestion won't solve the underlying issue.

The nature of incrementals is that the tradeoff to the improved performance is that they do drift from the source data over time. Long-term, dbt recommends running full refreshes periodically, for example every weekend, to reset the drift from the source data. This doc goes into more detail on long-term incremental considerations if you're curious!

Further, the proposed solution of the last modified date won't work in this case since the incremental strategy is triggered and driven entirely by updates to transactions. The logic of the incremental brings in new transactions, then left joins the master data. dbt isn't set up to handle the reverse, so that's why even if we find the master data updates, we cannot drive the incremental updates. We focused the incremental logic on updates to transaction records, since that is the main data being updated upstream. This is why a weekly or monthly --full-refresh would be the most effective way to capture updates to the master data.

The other recommendation we have besides a dbt run --full-refresh is to create a view of the incremental updated netsuite2__transaction_details in your warehouse, and then do realtime joins to account on the ids, which will not change, to capture any these updates. And then you should be able to capture a full record of the transaction/master updates when necessary.

Hope this makes sense. Happy to go into further detail if you still have questions (and we plan on documenting this out in our README as well)--incremental strategies are very tricky to handle!

jmongerlyra commented 2 months ago

Thanks @fivetran-avinash for the detailed response! Closing the issue.