fivetran / dbt_netsuite

Data models for Fivetran's Netsuite connector, built using dbt.
https://fivetran.github.io/dbt_netsuite/
Apache License 2.0
36 stars 34 forks source link

[Bug] Duplicate records in model netsuite2__transaction_details and netsuite2__balance_sheet #122

Closed Sunnyinho closed 2 months ago

Sunnyinho commented 3 months ago

Is there an existing issue for this?

Describe the issue

I am getting test failure in column transaction_details_id. There seems to have duplicate rows existing in the model as shown in figure below. The below example is for transaction details model

Screenshot 2024-06-06 at 19 23 49

Relevant error log or model output

dbt build -s netsuite2__transaction_details
13:46:11  Running with dbt=1.7.10
13:46:12  Registered adapter: snowflake=1.7.2
13:46:14  Found 643 models, 14 snapshots, 18 seeds, 2 operations, 1394 tests, 280 sources, 8 exposures, 0 metrics, 1526 macros, 0 groups, 0 semantic models
13:46:14  
13:46:21  
13:46:21  Running 1 on-run-start hook
13:46:24  1 of 1 START hook: elementary.on-run-start.0 ................................... [RUN]
13:46:24  1 of 1 OK hook: elementary.on-run-start.0 ...................................... [OK in 0.00s]
13:46:24  
13:46:24  Concurrency: 10 threads (target='dev')
13:46:24  
13:46:24  1 of 5 START sql incremental model raw_dev.scratch_sshah.netsuite2__transaction_details  [RUN]
13:46:48  1 of 5 OK created sql incremental model raw_dev.scratch_sshah.netsuite2__transaction_details  [SUCCESS 3724 in 24.12s]
13:46:48  2 of 5 START test not_null_netsuite2__transaction_details_transaction_details_id  [RUN]
13:46:48  3 of 5 START test not_null_netsuite2__transaction_details_transaction_id ....... [RUN]
13:46:48  4 of 5 START test not_null_netsuite2__transaction_details_transaction_line_id .. [RUN]
13:46:48  5 of 5 START test unique_netsuite2__transaction_details_transaction_details_id . [RUN]
13:46:52  2 of 5 PASS not_null_netsuite2__transaction_details_transaction_details_id ..... [PASS in 3.70s]
13:46:52  4 of 5 PASS not_null_netsuite2__transaction_details_transaction_line_id ........ [PASS in 3.69s]
13:46:52  3 of 5 PASS not_null_netsuite2__transaction_details_transaction_id ............. [PASS in 3.70s]
13:46:54  5 of 5 FAIL 1 unique_netsuite2__transaction_details_transaction_details_id ..... [FAIL 1 in 5.05s]
13:46:54  
13:46:54  Running 1 on-run-end hook
13:47:56  1 of 1 START hook: elementary.on-run-end.0 ..................................... [RUN]
13:47:56  1 of 1 OK hook: elementary.on-run-end.0 ........................................ [OK in 0.00s]
13:47:56  
13:47:57  
13:47:57  Finished running 1 incremental model, 4 tests, 2 hooks in 0 hours 1 minutes and 42.83 seconds (102.83s).
13:47:57  
13:47:57  Completed with 1 error and 0 warnings:
13:47:57  
13:47:57  Failure in test unique_netsuite2__transaction_details_transaction_details_id (models/netsuite2.yml)
13:47:57    Got 1 result, configured to fail if != 0
13:47:57  
13:47:57    compiled Code at target/compiled/netsuite/models/netsuite2.yml/unique_netsuite2__transaction_details_transaction_details_id.sql
13:47:57  
13:47:57  Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5

Expected behavior

I expect the models to run without any unique errors.

dbt Project configurations

Screenshot 2024-06-06 at 19 27 25

Package versions

  - package: fivetran/netsuite
    version: 0.13.0

What database are you using dbt with?

snowflake

dbt Version

Running with dbt=1.7.10
Registered adapter: snowflake=1.7.2

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-avinash commented 3 months ago

Hi @Sunnyinho ! Thanks for reaching out.

A few followups that can help us hone in on the source of these duplicate records:

  1. What is the test error you're seeing in the balance sheet? Is it the balance_sheet_id?
  2. When you look at the actual records of the data that are duplicating transaction_details_id, are there any specific fields that are different between the two? That could help us understand the cause as to why the same record is being brought in more than once.
  3. For the test error on balance sheet, what are the fields that are differing on the duplicate records?

Thanks for all of the additional context!

Sunnyinho commented 3 months ago

Hi @Sunnyinho ! Thanks for reaching out.

A few followups that can help us hone in on the source of these duplicate records:

  1. What is the test error you're seeing in the balance sheet? Is it the balance_sheet_id?
  2. When you look at the actual records of the data that are duplicating transaction_details_id, are there any specific fields that are different between the two? That could help us understand the cause as to why the same record is being brought in more than once.
  3. For the test error on balance sheet, what are the fields that are differing on the duplicate records?

Thanks for all of the additional context!

Hi @fivetran-avinash ! Thansk for the concern.

  1. Yes, the unique error is for balance_sheet_id.
  2. No, all the data for transaction_details_id same nothing different.
  3. There are no fields that are differing in balance sheet either.
fivetran-joemarkiewicz commented 3 months ago

Hi @Sunnyinho, @fivetran-avinash is OOO today so I wanted to jump in and help move this forward for the time being. A few additional questions and actions I would like to take to understand this further:

  1. Have you been using this data model for a while and this just unexpectedly failed recently? Or is this the first time you are running the model and are immediately seeing this error? If this is the first time, then we may need to adjust the variables you have defined in your dbt_project.yml to reflect the state of your Netsuite environment.

  2. Can you confirm that these transactions are not duplicates at the source? When querying transaction_id = 588718 and transaction_line_id = 1 from the raw netsuite.transaction_lines table, do you only see one record? This will verify if this is a duplicate at the source as opposed to the transformation creating the duplicate.

  3. If you are able to confirm that no duplicates are at the source and you already provided that there are no different fields in the end model then I would want to confirm that these duplicates are not introduced as a result of the incremental logic. Would you be able to run a dbt run -s +netsuite2__transaction_details --full-refresh && dbt test. This will rebuild the transaction_details table completely without an incremental strategy and will show if this is caused by the incremental logic. If the test passes and there are no longer duplicates, then it is the incremental logic which we will need to update. If not, we will need to dive further.

  4. Do you possible have subsidiaries in your Netsuite environment? This could potentially cause duplicates with your variable setup. I see you have netsuite2__multi_book_accounting_enabled set to true. Would you be able to also define netsuite2__using_to_subsidiary in your dbt_project.yml to true as well and run another --full-refresh. If you are not using subsidiaries then this run will likely fail. If you are using subsidiaries then it might update the end model to account for records that look to be duplicates, but need to be mapped to different subsidiairies.

Let me know if any of the above prove successful. This will help us get to the bottom of where this duplicate record is coming from. Thanks!

Sunnyinho commented 2 months ago

Hi @Sunnyinho, @fivetran-avinash is OOO today so I wanted to jump in and help move this forward for the time being. A few additional questions and actions I would like to take to understand this further:

  1. Have you been using this data model for a while and this just unexpectedly failed recently? Or is this the first time you are running the model and are immediately seeing this error? If this is the first time, then we may need to adjust the variables you have defined in your dbt_project.yml to reflect the state of your Netsuite environment.
  2. Can you confirm that these transactions are not duplicates at the source? When querying transaction_id = 588718 and transaction_line_id = 1 from the raw netsuite.transaction_lines table, do you only see one record? This will verify if this is a duplicate at the source as opposed to the transformation creating the duplicate.
  3. If you are able to confirm that no duplicates are at the source and you already provided that there are no different fields in the end model then I would want to confirm that these duplicates are not introduced as a result of the incremental logic. Would you be able to run a dbt run -s +netsuite2__transaction_details --full-refresh && dbt test. This will rebuild the transaction_details table completely without an incremental strategy and will show if this is caused by the incremental logic. If the test passes and there are no longer duplicates, then it is the incremental logic which we will need to update. If not, we will need to dive further.
  4. Do you possible have subsidiaries in your Netsuite environment? This could potentially cause duplicates with your variable setup. I see you have netsuite2__multi_book_accounting_enabled set to true. Would you be able to also define netsuite2__using_to_subsidiary in your dbt_project.yml to true as well and run another --full-refresh. If you are not using subsidiaries then this run will likely fail. If you are using subsidiaries then it might update the end model to account for records that look to be duplicates, but need to be mapped to different subsidiairies.

Let me know if any of the above prove successful. This will help us get to the bottom of where this duplicate record is coming from. Thanks!

  1. I just started using the model.
  2. Yes I have confirmed the netsuite.transactionlines do not have duplicate values.
  3. I ran the models with --full-refresh and the errors were resolved.
  4. No need to do step 4.

Thank you @fivetran-joemarkiewicz and @fivetran-avinash for assisting me here.

fivetran-joemarkiewicz commented 2 months ago

Sounds great, thanks for letting us know! Please don't hesitate to reach back out if duplicates arise again. A full refresh in this scenario will fix any future duplicates, but it would be worthwhile for us to investigate if this is just a one off, or if it is something we need to investigate further within the incremental logic.

Thanks again!