fivetran / dbt_netsuite

Data models for Fivetran's Netsuite connector, built using dbt.
https://fivetran.github.io/dbt_netsuite/
Apache License 2.0
36 stars 35 forks source link

[Bug] Duplicate Balance Sheet and Transaction Sheet Records #128

Open christian-wendlandt-amerilux opened 3 months ago

christian-wendlandt-amerilux commented 3 months ago

Is there an existing issue for this?

Describe the issue

Multiple records with different md5 hashes in the int_netsuite2__tran_with_converted_amounts table are being joined to the same transactions.

Relevant error log or model output

Failure in test unique_netsuite2__balance_sheet_balance_sheet_id (models/netsuite2.yml)
Failure in test unique_netsuite2__transaction_details_transaction_details_id (models/netsuite2.yml)

Here are some example records from netsuite2__transaction_details. Sensitive columns have been replaces with ****.

TO_SUBSIDIARY_ID,TO_SUBSIDIARY_NAME,TO_SUBSIDIARY_CURRENCY_SYMBOL,TRANSACTION_LINE_ID,TRANSACTION_MEMO,IS_TRANSACTION_NON_POSTING,IS_MAIN_LINE,IS_TAX_LINE,IS_CLOSED,TRANSACTION_ID,TRANSACTION_STATUS,TRANSACTION_DATE,TRANSACTION_DUE_DATE,TRANSACTION_TYPE,_FIVETRAN_SYNCED_DATE,TRANSACTION_NUMBER,ENTITY_ID,IS_TRANSACTION_INTERCOMPANY_ADJUSTMENT,ACCOUNTING_PERIOD_ENDING,ACCOUNTING_PERIOD_NAME,ACCOUNTING_PERIOD_ID,IS_ACCOUNTING_PERIOD_ADJUSTMENT,IS_ACCOUNTING_PERIOD_CLOSED,ACCOUNT_NAME,ACCOUNT_TYPE_NAME,ACCOUNT_TYPE_ID,ACCOUNT_ID,ACCOUNT_NUMBER,IS_ACCOUNT_LEFTSIDE,IS_ACCOUNTS_PAYABLE,IS_ACCOUNTS_RECEIVABLE,IS_ACCOUNT_INTERCOMPANY,PARENT_ACCOUNT_NAME,IS_EXPENSE_ACCOUNT,IS_INCOME_ACCOUNT,COMPANY_NAME,CUSTOMER_CITY,CUSTOMER_STATE,CUSTOMER_ZIPCODE,CUSTOMER_COUNTRY,CUSTOMER_DATE_FIRST_ORDER,CUSTOMER_EXTERNAL_ID,CLASS_FULL_NAME,ITEM_ID,ITEM_NAME,ITEM_TYPE_NAME,SALES_DESCRIPTION,LOCATION_NAME,LOCATION_CITY,LOCATION_COUNTRY,VENDOR_NAME,VENDOR_CREATE_DATE,CURRENCY_NAME,CURRENCY_SYMBOL,DEPARTMENT_ID,DEPARTMENT_NAME,SUBSIDIARY_ID,SUBSIDIARY_NAME,CONVERTED_AMOUNT,TRANSACTION_AMOUNT,TRANSACTION_DETAILS_ID
1,****,USD,1,LOWES RMA REQUEST PO #266278322 - CORRECTED BOL,false,false,false,true,612140,A,2024-07-25 00:00:00.000 Z,2024-09-08 00:00:00.000 Z,VendBill,2024-07-30,VENDBILL9225,630,,2024-07-31 00:00:00.000 Z,Jul 2024,11,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT : LTL FREIGHT OUT,Cost of Goods Sold,COGS,921,47310,true,false,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT,false,false,,,,,,,,****,,,,,****,****,****,****,2024-02-29 11:26:01.000 Z,USD,USD,,,1,****,39.97,39.97,9deac63901cef072f2c5037ac122729b
1,****,USD,1,LOWES RMA REQUEST PO #266278322 - CORRECTED BOL,false,false,false,true,612140,A,2024-07-25 00:00:00.000 Z,2024-09-08 00:00:00.000 Z,VendBill,2024-07-30,VENDBILL9225,630,,2024-07-31 00:00:00.000 Z,Jul 2024,11,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT : LTL FREIGHT OUT,Cost of Goods Sold,COGS,921,47310,true,false,false,false,NON-MATERIAL COST OF GOODS SOLD : DIRECT OVERHEAD COST OF GOODS SOLD : FREIGHT OUT,false,false,,,,,,,,****,,,,,****,****,****,****,2024-02-29 11:26:01.000 Z,USD,USD,,,1,****,39.97,39.97,9deac63901cef072f2c5037ac122729b

Expected behavior

I expect to not have uniqueness errors.

dbt Project configurations

vars: netsuite_data_model: netsuite2 netsuite_database: 'PROD_FIVETRAN' netsuite_schema: netsuite_suiteanalytics netsuite2multibook_accounting_enabled: false # False by default. Disable accountingbooksubsidiary and accountingbook if you are not using the Multi-Book Accounting feature netsuite2using_to_subsidiary: true # False by default. netsuite2using_exchange_rate: true #True by default. Disable exchange_rate if you don't utilize exchange rates. If you set this variable to false, ensure it is scoped globally so that the netsuite_source package can access it as well. netsuite2using_vendor_categories: false # True by default. Disable vendorcategory if you don't categorize your vendors netsuite2__using_jobs: false # True by default. Disable job if you don't use jobs

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

Additional Context

I found no duplicate transactions in the source models.

Doing a full refresh only fixes the problem temporarily.

I've narrowed the problem down to this join logic that I found in "target\compiled\netsuite\models\netsuite2\netsuite2__transaction_details.sql".

transaction_details as (
  select
  ...
  left join transactions_with_converted_amounts
    on transactions_with_converted_amounts.transaction_line_id = transaction_lines.transaction_line_id
      and transactions_with_converted_amounts.transaction_id = transaction_lines.transaction_id
      and transactions_with_converted_amounts.transaction_accounting_period_id = transactions_with_converted_amounts.reporting_accounting_period_id
  ....
)

There are two records in int_netsuite2tran_with_converted_amounts that are almost identical but have different md5 hashes. It appears that the ACCOUNT_ID of the record was changed, and now the package sees this as two different transactions for this model but not for the netsuite2balance_sheet and netsuite2__transaction_details models.

TRANSACTION_ID,TRANSACTION_LINE_ID,SUBSIDIARY_ID,ACCOUNT_ID,TRANSACTION_ACCOUNTING_PERIOD_ID,UNCONVERTED_AMOUNT,_FIVETRAN_SYNCED_DATE,REPORTING_ACCOUNTING_PERIOD_ID,EXCHANGE_RATE_REPORTING_PERIOD,EXCHANGE_RATE_TRANSACTION_PERIOD,TO_SUBSIDIARY_ID,TO_SUBSIDIARY_NAME,TO_SUBSIDIARY_CURRENCY_SYMBOL,CONVERTED_AMOUNT_USING_TRANSACTION_ACCOUNTING_PERIOD,CONVERTED_AMOUNT_USING_REPORTING_MONTH,IS_INCOME_STATEMENT,ACCOUNT_CATEGORY,TRAN_WITH_CONVERTED_AMOUNTS_ID 612140,1,1,921,11,39.97,2024-07-30,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,497b9917e06ffc917c8c78e024dbdb94 612140,1,1,941,11,39.97,2024-07-27,11,1,1,1,AmeriLux International,USD,39.97,39.97,true,Expense,3994190e8f3b4bd87bf12733cdbf77d7

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 3 months ago

Hi @christian-wendlandt-amerilux thanks for opening this issue and providing a thorough investigation and explanation of what seems to be occurring.

One item in your initial issue description that immediately jumped out to me was the following:

It appears that the ACCOUNT_ID of the record was changed

Would you be able to add some more details to how the account_id of the record changed? Did this happen within Netsuite itself, or does this seem to be a bug within the package code itself? Additionally, you mentioned these duplicates are not appearing in the BS or IS models. Are you able to see the similar account_id changing in those model results?

christian-wendlandt-amerilux commented 3 months ago

Would you be able to add some more details to how the account_id of the record changed? Did this happen within Netsuite itself, or does this seem to be a bug within the package code itself? Additionally, you mentioned these duplicates are not appearing in the BS or IS models. Are you able to see the similar account_id changing in those model results?

It appears that the account was changed by an accountant in NetSuite on purpose. chrome_5Kz0QJIYaK

When looking up the transaction in the source models, only the version with the new account number shows up. It appears that (with my package configuration at least), account_id is only considered part of the primary key for int_netsuite2__tran_with_converted_amounts model, which is why another record was created. I've ran that model again as a query and can confirm that the old transaction doesn't exist. It's only still in the database table because that model is incremental, and the old record has a different hash.

christian-wendlandt-amerilux commented 3 months ago

It appears that (with my package configuration at least), account_id is only considered part of the primary key for int_netsuite2__tran_with_converted_amounts model, which is why another record was created.

Not saying that the extra record is the issue. Maybe my configuration variables are wrong, or maybe the join isn't strict enough. Here is how that join looks in the source file:

left join transactions_with_converted_amounts
    on transactions_with_converted_amounts.transaction_line_id = transaction_lines.transaction_line_id
      and transactions_with_converted_amounts.transaction_id = transaction_lines.transaction_id
      and transactions_with_converted_amounts.transaction_accounting_period_id = transactions_with_converted_amounts.reporting_accounting_period_id

      {% if var('netsuite2__multibook_accounting_enabled', false) %}
      and transactions_with_converted_amounts.accounting_book_id = transaction_lines.accounting_book_id
      {% endif %}
fivetran-joemarkiewicz commented 3 months ago

@christian-wendlandt-amerilux thanks again for sharing the additional details here.

I am on the fence between if this is a result of some incremental logic tradeoffs as described within Issue #121 where we recommend periodic full refreshes to ensure data quality, or if this is a join update as you described. It's tricky because I was able to recreate the issue in my test environment and found both solutions to work. However, I'm a bit apprehensive to implement the code changes as I am not entirely certain at this moment if this could have other downstream implications not yet considered.

If you want, I would welcome you to try out the branch in your packages.yml with these changes and let me know if you are able to see this issue resolved. In the meantime we can continue the discussion if this makes sense to include in the package code. I welcome any other users of this package to join in the discussion in case there are more factors to consider.

packages:
  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: bugfix/changing-transactions
    warn-unpinned: false 

Let me know your thoughts. Thanks!

christian-wendlandt-amerilux commented 3 months ago

@christian-wendlandt-amerilux thanks again for sharing the additional details here.

I am on the fence between if this is a result of some incremental logic tradeoffs as described within Issue #121 where we recommend periodic full refreshes to ensure data quality, or if this is a join update as you described. It's tricky because I was able to recreate the issue in my test environment and found both solutions to work. However, I'm a bit apprehensive to implement the code changes as I am not entirely certain at this moment if this could have other downstream implications not yet considered.

If you want, I would welcome you to try out the branch in your packages.yml with these changes and let me know if you are able to see this issue resolved. In the meantime we can continue the discussion if this makes sense to include in the package code. I welcome any other users of this package to join in the discussion in case there are more factors to consider.

packages:
  - git: https://github.com/fivetran/dbt_netsuite.git
    revision: bugfix/changing-transactions
    warn-unpinned: false 

Let me know your thoughts. Thanks!

Thank you @fivetran-joemarkiewicz , I really appreciate the quick fix! I just started using this package for my organization, so I'm also fearful of causing any downstream errors. I'm happy to try it out and share the results. I'll come back in a week or two.

If this is a legitimate fix, I don't think that I would classify the problem as a "data quality" issue, since the bug is originating from a partial key join. Requiring full refreshes seems like a Band-Aid solution if that's truly what the problem is.

I'm not asking you to try this, but perhaps the solution might be to remove account_id from the composite key. I don't know enough about NetSuite to understand why it is being included.

Thanks again for the assistance.

fivetran-joemarkiewicz commented 3 months ago

@christian-wendlandt-amerilux thanks so much for the information here. I'll keep this issue open for a while and will wait to hear back the results if the issue no longer persists. I will keep this branch locked unless otherwise mentioned here so you will not see any unexpected changes.

If this does seem to address the issue and is a legitimate fix then we will explore adding this to the package in the next release. Thanks for your collaboration here and look forward to hearing back if this is a legitimate long term fix for you!