fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 25 forks source link

[Bug] stg_fivetran_log__log unique_combination_of_columns test is too strict #42

Closed samkessaram closed 2 years ago

samkessaram commented 2 years ago

Is there an existing issue for this?

Describe the issue

The dbt_utils.unique_combination_of_columns test for the stg_fivetran_log__log model is failing in my project, but when I look at some failing rows I see that they are distinct events.

e.g.

LOG_ID CREATED_AT CONNECTOR_ID EVENT_TYPE MESSAGE_DATA EVENT_SUBTYPE TRANSFORMATION_ID
some_id 2021-03-01 05:10:15 some_id INFO {"schema":"schema_name","name":"some_id"} copy_rows
some_id 2021-03-01 05:10:15 some_id INFO {"table":"table_name"} write_to_table_start

The test:

- name: stg_fivetran_log__log
    description: > 
      Table of logged events related to data syncs. More info on error logs 
      [here](https://fivetran.com/docs/logs/fivetran-log#errorlogs)
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: 
            - log_id
            - created_at

Relevant error log or model output

Failure in test dbt_utils_unique_combination_of_columns_stg_fivetran_log__log_log_id__created_at (models/staging/stg_fivetran_log.yml)
  Got 1 results, configured to fail if != 0

Expected behavior

I assume that the intention is to flag duplicate logs, so the case that I posted above shouldn't cause a failure since the values for the additional columns are different between the rows.

dbt Project configurations

vars:
  fivetran_log:
    fivetran_log_database: [our db]
    fivetran_log_schema: fivetran_log
    fivetran_log_using_transformations: false # this will disable all transformation + trigger_table logic 
    fivetran_log_using_triggers: false # this will disable only trigger_table logic

Package versions

packages:
  - package: dbt-labs/dbt_utils
    version: 0.7.4
  - package: fivetran/fivetran_log
    version: 0.4.1

What database are you using dbt with?

snowflake

dbt Version

dbt=0.21.0

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @samkessaram thanks so much for raising this issue.

Looking at the snippet you provided I am getting the impression that these are in fact unique sync events. I am curious if the test we have in place needs to be updated following a recent release for the Fivetran Log connector. I can see that in January of 2022 the release notes indicate that the log table now has a sync_id field.

Would you be able to query these two records in the source log table (not the staging table created from the package) and confirm if the sync_id for these fields is in fact unique?

samkessaram commented 2 years ago

Hey @fivetran-joemarkiewicz

These records have the same sync_id, but different a sequence_number 🤔

ID TIME_STAMP CONNECTOR_ID TRANSFORMATION_ID EVENT MESSAGE_EVENT MESSAGE_DATA _FIVETRAN_SYNCED SYNC_ID SEQUENCE_NUMBER PROCESS_ID
some_id 2022-02-09 11:09:53.394 +0000 some_id INFO write_to_table_start {"table":"table_one"} 2022-02-09 17:09:48.362 +0000 075fb89d-b6c7-47c6-b475-79c8e2dc7907 1,396 9526071f-2d1b-4baa-8aea-221416e33a4f
some_id 2022-02-09 11:09:53.394 +0000 some_id INFO write_to_table_start {"table":"table_two"} 2022-02-09 17:09:48.362 +0000 075fb89d-b6c7-47c6-b475-79c8e2dc7907 1,395 9526071f-2d1b-4baa-8aea-221416e33a4f
fivetran-joemarkiewicz commented 2 years ago

Thanks for sharing this @samkessaram. That is indeed interesting, but a good lead to see that the sequence number is different.

Let me look into the connector changelog a bit further on my end to see what the sequence_number field details.

fivetran-joemarkiewicz commented 2 years ago

Hi @samkessaram I just wanted to provide an update that I have been able to connect with our engineering teams and this very same question has actively been discussed.

Ultimately there will soon be a change to the connector which will update the id field within the log table that will better represent the uniqueness of the records. In the end, the package uniqueness test will still be accurate with checking the id and the created_at fields.

This change mentioned above is not live yet within the connector. I can share more once I know the change is live and we can check to see if this test is still failing. In the meantime, you can exclude this test if you would like.

fivetran-joemarkiewicz commented 2 years ago

@samkessaram I just checked my Fivetran Log data and see the id field has been updated to represent the true uniqueness of the record. If you perform a full resync on your connector you will see this become updated for all of your past records.

With this update on the connector I am able to see the uniqueness test pass! I will close this issue as the latest connector update should resolve the error. Please feel free to comment if you are still experiencing the issue within your data.

Thanks again for all your help on identifying this issue.