fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 24 forks source link

[Bug] Failure in test in one of fivetran_logs model. #88

Closed Sunnyinho closed 11 months ago

Sunnyinho commented 11 months ago

Is there an existing issue for this?

Describe the issue

There is a test failure in one of the models. The model name is fivetran_platform__schema_changelog. The test error log is


Failure in test dbt_utils_unique_combination_of_columns_fivetran_platform__schema_changelog_connector_id__destination_id__message_data__created_at (models/fivetran_platform.yml)

The package version used is

  - package: fivetran/fivetran_log
    version: 1.0.1

With upgraded version 1.1.0 the error still persists.

Steps to reproduce error

  1. Install the package
  2. $ dbt deps
  3. $ dbt run -s fivetran_logs # this run all the models and successfully runs everything
  4. $ dbt test -s fivetran_logs # this runs all the tests related to the models The failure will be reproduced at this point.

Relevant error log or model output

~/dbt-project ❯ dbt test -s fivetran_log                                                                                                                                      
15:32:58  Running with dbt=1.6.0
15:32:58  Registered adapter: snowflake=1.6.1
15:33:00  Found 515 models, 12 snapshots, 18 seeds, 2 operations, 1300 tests, 241 sources, 8 exposures, 0 metrics, 1378 macros, 0 groups, 0 semantic models
15:33:00  
15:33:03  
15:33:03  Running 1 on-run-start hook
15:33:06  1 of 1 START hook: elementary.on-run-start.0 ................................... [RUN]
15:33:06  1 of 1 OK hook: elementary.on-run-start.0 ...................................... [OK in 0.00s]
15:33:06  
15:33:06  Concurrency: 10 threads (target='dev')
15:33:06  
15:33:06  1 of 18 START test dbt_utils_unique_combination_of_columns_fivetran_platform__connector_daily_events_connector_id__destination_id__date_day  [RUN]
15:33:06  2 of 18 START test dbt_utils_unique_combination_of_columns_fivetran_platform__connector_status_connector_id__destination_id  [RUN]
15:33:06  3 of 18 START test dbt_utils_unique_combination_of_columns_fivetran_platform__mar_table_history_connector_id__destination_id__schema_name__table_name__measured_month  [RUN]
15:33:06  4 of 18 START test dbt_utils_unique_combination_of_columns_fivetran_platform__schema_changelog_connector_id__destination_id__message_data__created_at  [RUN]
15:33:06  5 of 18 START test dbt_utils_unique_combination_of_columns_fivetran_platform__usage_mar_destination_history_destination_id__measured_month  [RUN]
15:33:06  6 of 18 START test dbt_utils_unique_combination_of_columns_stg_fivetran_platform__connector_connector_name__destination_id  [RUN]
15:33:06  7 of 18 START test dbt_utils_unique_combination_of_columns_stg_fivetran_platform__credits_used_measured_month__destination_id  [RUN]
15:33:06  8 of 18 START test dbt_utils_unique_combination_of_columns_stg_fivetran_platform__destination_membership_destination_id__user_id  [RUN]
15:33:06  9 of 18 START test dbt_utils_unique_combination_of_columns_stg_fivetran_platform__log_log_id__created_at  [RUN]
15:33:06  10 of 18 START test dbt_utils_unique_combination_of_columns_stg_fivetran_platform__usage_cost_measured_month__destination_id  [RUN]
15:33:11  8 of 18 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_platform__destination_membership_destination_id__user_id  [PASS in 4.30s]
15:33:11  11 of 18 START test not_null_fivetran_platform__audit_table_unique_table_sync_key  [RUN]
15:33:11  5 of 18 PASS dbt_utils_unique_combination_of_columns_fivetran_platform__usage_mar_destination_history_destination_id__measured_month  [PASS in 4.33s]
15:33:11  12 of 18 START test not_null_stg_fivetran_platform__account_account_id ......... [RUN]
15:33:11  10 of 18 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_platform__usage_cost_measured_month__destination_id  [PASS in 4.80s]
15:33:11  13 of 18 START test not_null_stg_fivetran_platform__destination_destination_id . [RUN]
15:33:11  3 of 18 PASS dbt_utils_unique_combination_of_columns_fivetran_platform__mar_table_history_connector_id__destination_id__schema_name__table_name__measured_month  [PASS in 5.00s]
15:33:11  14 of 18 START test not_null_stg_fivetran_platform__user_user_id ............... [RUN]
15:33:11  7 of 18 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_platform__credits_used_measured_month__destination_id  [PASS in 5.14s]
15:33:11  15 of 18 START test unique_fivetran_platform__audit_table_unique_table_sync_key  [RUN]
15:33:11  6 of 18 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_platform__connector_connector_name__destination_id  [PASS in 5.15s]
15:33:11  16 of 18 START test unique_stg_fivetran_platform__account_account_id ........... [RUN]
15:33:12  2 of 18 PASS dbt_utils_unique_combination_of_columns_fivetran_platform__connector_status_connector_id__destination_id  [PASS in 5.22s]
15:33:12  17 of 18 START test unique_stg_fivetran_platform__destination_destination_id ... [RUN]
15:33:12  1 of 18 PASS dbt_utils_unique_combination_of_columns_fivetran_platform__connector_daily_events_connector_id__destination_id__date_day  [PASS in 5.35s]
15:33:12  18 of 18 START test unique_stg_fivetran_platform__user_user_id ................. [RUN]
15:33:12  4 of 18 FAIL 1 dbt_utils_unique_combination_of_columns_fivetran_platform__schema_changelog_connector_id__destination_id__message_data__created_at  [FAIL 1 in 5.51s]
15:33:15  11 of 18 PASS not_null_fivetran_platform__audit_table_unique_table_sync_key .... [PASS in 4.26s]
15:33:15  12 of 18 PASS not_null_stg_fivetran_platform__account_account_id ............... [PASS in 4.51s]
15:33:15  13 of 18 PASS not_null_stg_fivetran_platform__destination_destination_id ....... [PASS in 4.30s]
15:33:16  14 of 18 PASS not_null_stg_fivetran_platform__user_user_id ..................... [PASS in 4.18s]
15:33:16  18 of 18 PASS unique_stg_fivetran_platform__user_user_id ....................... [PASS in 4.28s]
15:33:17  16 of 18 PASS unique_stg_fivetran_platform__account_account_id ................. [PASS in 5.24s]
15:33:17  15 of 18 PASS unique_fivetran_platform__audit_table_unique_table_sync_key ...... [PASS in 5.26s]
15:33:17  17 of 18 PASS unique_stg_fivetran_platform__destination_destination_id ......... [PASS in 5.19s]
15:33:27  9 of 18 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_platform__log_log_id__created_at  [PASS in 21.13s]
15:33:27  
15:33:27  Running 1 on-run-end hook
15:33:47  1 of 1 START hook: elementary.on-run-end.0 ..................................... [RUN]
15:33:47  1 of 1 OK hook: elementary.on-run-end.0 ........................................ [OK in 0.00s]
15:33:47  
15:33:47  
15:33:47  Finished running 18 tests, 2 hooks in 0 hours 0 minutes and 47.40 seconds (47.40s).
15:33:48  
15:33:48  Completed with 1 error and 0 warnings:
15:33:48  
15:33:48  Failure in test dbt_utils_unique_combination_of_columns_fivetran_platform__schema_changelog_connector_id__destination_id__message_data__created_at (models/fivetran_platform.yml)
15:33:48    Got 1 result, configured to fail if != 0
15:33:48  
15:33:48    compiled Code at target/compiled/fivetran_log/models/fivetran_platform.yml/dbt_utils_unique_combination_o_9aef0d81d64508710644e2d85b13c3dd.sql
15:33:48  
15:33:48  Done. PASS=17 WARN=0 ERROR=1 SKIP=0 TOTAL=18

Expected behavior

The expectation is to successfully pass the test without any error

dbt Project configurations

vars:
  fivetran_platform_database: raw # default is your target.database
  fivetran_platform_schema: fivetran_logs # default is fivetran_log
  fivetran_platform_using_transformations: false # this will disable all transformation + trigger_table logic
  fivetran_platform_using_triggers: false # this will disable only trigger_table logic 
  fivetran_platform_using_destination_membership: true # this will disable only the destination membership logic
  fivetran_platform_using_user: true # this will disable only the user logic

Package versions

packages:
  - package: fivetran/fivetran_log
    version: 1.0.1

What database are you using dbt with?

snowflake

dbt Version

~/dbt-project ❯ dbt --version
Core:
  - installed: 1.6.0
  - latest:    1.6.2 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.6.1 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 11 months ago

Hi @Sunnyinho thank you for opening this issue.

Would you be able to share the a screenshot of the results for this test failure? It looks like there is 1 result where this test is failing. I would be curious to see the record and where the duplicate is occurring which our test in not checking.

Thanks!

Sunnyinho commented 11 months ago

Hi @Sunnyinho thank you for opening this issue.

Would you be able to share the a screenshot of the results for this test failure? It looks like there is 1 result where this test is failing. I would be curious to see the record and where the duplicate is occurring which our test in not checking.

Thanks!

Hi @fivetran-joemarkiewicz thanks for replying. By result for test failure did you mean the logs of the error?

fivetran-joemarkiewicz commented 11 months ago

Sure thing happy to clarify!

By result of the failure I mean the actual results in the table that are triggering this test failure. You can find the sql used to run this test in the target folder. But the code should look something like the following:

with validation_errors as (

    select
        connector_id, destination_id, message_data, created_at
    from fivetran_platform__schema_changelog
    group by connector_id, destination_id, message_data, created_at
    having count(*) > 1

)

select *
from validation_errors

Would you be able to run this with the changes for your database/schema and we can inspect the record that is triggering the failure and see what the duplicates are for this scenario. You likely will need to run an additional query that doesn't perform the aggregations to inspect the duplicate records to inspect them in more detail. This second query would be the most helpful to understand what is (or is not) different between the duplicates.

Sunnyinho commented 11 months ago

Sure thing happy to clarify!

By result of the failure I mean the actual results in the table that are triggering this test failure. You can find the sql used to run this test in the target folder. But the code should look something like the following:

with validation_errors as (

    select
        connector_id, destination_id, message_data, created_at
    from fivetran_platform__schema_changelog
    group by connector_id, destination_id, message_data, created_at
    having count(*) > 1

)

select *
from validation_errors

Would you be able to run this with the changes for your database/schema and we can inspect the record that is triggering the failure and see what the duplicates are for this scenario. You likely will need to run an additional query that doesn't perform the aggregations to inspect the duplicate records to inspect them in more detail. This second query would be the most helpful to understand what is (or is not) different between the duplicates.

select
*
from analytics_dev.fivetran_platform.fivetran_platform__schema_changelog
where connector_id = 'wildfire_antacid' and created_at = '2022-04-08 19:00:14.756'
;

Found the duplicate record

fivetran-joemarkiewicz commented 11 months ago

Great! Would you be able to share the results as a screenshot. I am curious if there is a field (which we are not testing) that is different between the two records. Or if they are in fact true duplicates.

Sunnyinho commented 11 months ago
CONNECTOR_ID CONNECTOR_NAME DESTINATION_ID DESTINATION_NAME CREATED_AT EVENT_SUBTYPE MESSAGE_DATA TABLE_NAME SCHEMA_NAME
wildfire_antacid citizen_sheets.performance flavoring_epistle snowflake_raw 2022-04-08 19:00:14.756 alter_table "{\"type\":\"DROP_COLUMN\",\"table\":\"performance\",\"properties\":{\"columnName\":\"s_2_incidents-deprecated-e79ae695-63ec-490c-b993-2b7ca3129455\"}}" "performance" NULL
wildfire_antacid citizen_sheets.performance flavoring_epistle snowflake_raw 2022-04-08 19:00:14.756 alter_table "{\"type\":\"DROP_COLUMN\",\"table\":\"performance\",\"properties\":{\"columnName\":\"s_2_incidents-deprecated-e79ae695-63ec-490c-b993-2b7ca3129455\"}}" "performance" NULL
Sunnyinho commented 11 months ago

Great! Would you be able to share the results as a screenshot. I am curious if there is a field (which we are not testing) that is different between the two records. Or if they are in fact true duplicates.

I see these are true duplicate

fivetran-joemarkiewicz commented 11 months ago

Thanks for sharing @Sunnyinho! I would agree that these do in fact seem to be true duplicates. Since that is the case, I believe this may be a case of "test failed successfully" where there should not be duplicate records but the package test identified them as such.

I believe this duplicate is actually originating from the connector. Are you able to confirm that this connector_id has these duplicates in the raw log table? If so, then there is not much we will be able to fix within the package. Instead I would recommend opening a support ticket for our connector team to investigate the origin of the duplicate log record.

Sunnyinho commented 11 months ago

Thanks for sharing @Sunnyinho! I would agree that these do in fact seem to be true duplicates. Since that is the case, I believe this may be a case of "test failed successfully" where there should not be duplicate records but the package test identified them as such.

I believe this duplicate is actually originating from the connector. Are you able to confirm that this connector_id has these duplicates in the raw log table? If so, then there is not much we will be able to fix within the package. Instead I would recommend opening a support ticket for our connector team to investigate the origin of the duplicate log record.

I found the duplicate error in the raw log table(RAW.FIVETRAN_LOGS.LOG), since the package uses the data from fivetran_logs schema will the issue be solved if I delete one of those duplicate records from the table?

ID CONNECTOR_ID EVENT MESSAGE_EVENT MESSAGE_DATA SYNC_ID TIME_STAMP _FIVETRAN_SYNCED
ImWd0lRHnOsujN+G5PdtQqg6adU= wildfire_antacid INFO alter_table "{\"type\":\"DROP_COLUMN\",\"table\":\"performance\",\"properties\":{\"columnName\":\"s_2_incidents-deprecated-e79ae695-63ec-490c-b993-2b7ca3129455\"}}" 74795cbf-6de7-4cc5-986e-4186fad3fa88 2022-04-08 19:00:14.756 +0000 2022-04-12 07:28:55.376 +0000
gDbpKzEb7HnYVMf1gt/V5JpzHgc= wildfire_antacid INFO alter_table "{\"type\":\"DROP_COLUMN\",\"table\":\"performance\",\"properties\":{\"columnName\":\"s_2_incidents-deprecated-e79ae695-63ec-490c-b993-2b7ca3129455\"}}" 74795cbf-6de7-4cc5-986e-4186fad3fa88 2022-04-08 19:00:14.756 +0000 2022-04-12 07:28:55.376 +0000
fivetran-joemarkiewicz commented 11 months ago

Thanks for investigating the raw @Sunnyinho! Correct, removing the duplicate from the raw will resolve this test failure. I would still consider opening a support ticket to understand why exactly this duplicate was introduced. Just in case this occurs in the future and you have a better understanding as to why this happened.

In the meantime, I will mark this ticket as done and won't fix since this was something that was introduced via the connector and identified via the package. Thanks for raising this and glad we were able to identify the root of the test failure!

Sunnyinho commented 11 months ago

Thanks for investigating the raw @Sunnyinho! Correct, removing the duplicate from the raw will resolve this test failure. I would still consider opening a support ticket to understand why exactly this duplicate was introduced. Just in case this occurs in the future and you have a better understanding as to why this happened.

In the meantime, I will mark this ticket as done and won't fix since this was something that was introduced via the connector and identified via the package. Thanks for raising this and glad we were able to identify the root of the test failure!

Thanks for helping @fivetran-joemarkiewicz. I have already created a support ticket