Closed samkessaram closed 2 years ago
Hi @samkessaram thanks so much for raising this issue.
Looking at the snippet you provided I am getting the impression that these are in fact unique sync events. I am curious if the test we have in place needs to be updated following a recent release for the Fivetran Log connector. I can see that in January of 2022 the release notes indicate that the log
table now has a sync_id
field.
Would you be able to query these two records in the source log
table (not the staging table created from the package) and confirm if the sync_id
for these fields is in fact unique?
Hey @fivetran-joemarkiewicz
These records have the same sync_id
, but different a sequence_number
🤔
ID | TIME_STAMP | CONNECTOR_ID | TRANSFORMATION_ID | EVENT | MESSAGE_EVENT | MESSAGE_DATA | _FIVETRAN_SYNCED | SYNC_ID | SEQUENCE_NUMBER | PROCESS_ID |
---|---|---|---|---|---|---|---|---|---|---|
some_id | 2022-02-09 11:09:53.394 +0000 | some_id | INFO | write_to_table_start | {"table":"table_one"} | 2022-02-09 17:09:48.362 +0000 | 075fb89d-b6c7-47c6-b475-79c8e2dc7907 | 1,396 | 9526071f-2d1b-4baa-8aea-221416e33a4f | |
some_id | 2022-02-09 11:09:53.394 +0000 | some_id | INFO | write_to_table_start | {"table":"table_two"} | 2022-02-09 17:09:48.362 +0000 | 075fb89d-b6c7-47c6-b475-79c8e2dc7907 | 1,395 | 9526071f-2d1b-4baa-8aea-221416e33a4f |
Thanks for sharing this @samkessaram. That is indeed interesting, but a good lead to see that the sequence number is different.
Let me look into the connector changelog a bit further on my end to see what the sequence_number
field details.
Hi @samkessaram I just wanted to provide an update that I have been able to connect with our engineering teams and this very same question has actively been discussed.
Ultimately there will soon be a change to the connector which will update the id
field within the log
table that will better represent the uniqueness of the records. In the end, the package uniqueness test will still be accurate with checking the id
and the created_at
fields.
This change mentioned above is not live yet within the connector. I can share more once I know the change is live and we can check to see if this test is still failing. In the meantime, you can exclude this test if you would like.
@samkessaram I just checked my Fivetran Log data and see the id
field has been updated to represent the true uniqueness of the record. If you perform a full resync on your connector you will see this become updated for all of your past records.
With this update on the connector I am able to see the uniqueness test pass! I will close this issue as the latest connector update should resolve the error. Please feel free to comment if you are still experiencing the issue within your data.
Thanks again for all your help on identifying this issue.
Is there an existing issue for this?
Describe the issue
The dbt_utils.unique_combination_of_columns test for the stg_fivetran_log__log model is failing in my project, but when I look at some failing rows I see that they are distinct events.
e.g.
The test:
Relevant error log or model output
Expected behavior
I assume that the intention is to flag duplicate logs, so the case that I posted above shouldn't cause a failure since the values for the additional columns are different between the rows.
dbt Project configurations
Package versions
What database are you using dbt with?
snowflake
dbt Version
dbt=0.21.0
Additional Context
No response
Are you willing to open a PR to help address this issue?