Closed simon-stepper closed 2 years ago
Fivetran support confirmed that the raw data behaviour is as expected in https://support.fivetran.com/hc/en-us/requests/55700.
@NoToWarAlways thanks so much for this detailed bug report write up!
Based on the supporting information you provided, it is clear to me that the linkedin_pages__posts
needs to filter for the most recent ugc_post_id
& source_relation
record. I just looked through our own data and recognized we have not edited any past posts which is most likely why this test hadn't failed on our end. Glad the test did it's job though in raising this issue!
I will review your PR above and let you know if I have any requested changes or comments before we merge. Thanks again so much. These contributions help make our packages better. We really appreciate it 💯
Is there an existing issue for this?
Current Behavior
We see failing tests for linkedin_pages__posts.sql for a non-unique combination of columns of
ugc_post_id
&source_relation
.Expected Behavior
The model is filtering to always use the latest version of this post and hence doesn't produce duplicated rows.
Steps To Reproduce
No response
Relevant log output
What database are you using dbt with?
snowflake
Additional Context
The primary key for ugc_post_history is the combination of
id
&last_modified_time
according to the ERD provided for the LinkedIn Company Pages connector. When posts get edited, they get a new row in this table, which can be identified by either thelast_modified_time
or theversion_tag
columns. A similar solution to what is already used inint_linkedin_pages__latest_post.sql
could be introduced to only use the most recent record for a post.Are you willing to open a PR to help address this issue?