Closed fivetran-joemarkiewicz closed 3 years ago
@fivetran-joemarkiewicz Hey. This makes sense. What's weird about the Hubspot data is that the information about the merge is on the contact that has been merged into, not on the contact that has been merged. There's a column called merged_vids
which is an array of the contact IDs that have been merged into a given contact.
I think to filter these out we would need to grab the unique set of IDs stored in merged_vids
and then filter any of those IDs out from the table. This could be done in the staging model, though we may want to leave the grain of that untouched? If that's the case, I think we could do it in an intermediate model in the transform package.
Thanks for looking into this @DylanBaker. I was not aware of the merged_vids
column previously, but looking into it I see how it stores both the merged contact and contacts that have been merged into it.
My one curiosity is that on the dataset I have available it seems that the merged contacts no longer exist within the contact
table? I wonder why the customer is still seeing these merged contacts within their dataset while I see no record of the merged contacts?
I do agree with you however, that this should be performed within an intermediate model in the transform package to leave the grain of the staging model untouched.
This feature has been integrated within the latest v0.4.1
release of the dbt_hubspot package. FYI the feature is disabled by default and you may enable the contact merge auditing by setting the hubspot_contact_merge_audit_enabled
to true
.
Are you a current Fivetran customer?
Issue identified by Marion Pavillet
Describe the bug
Our integration to the datawarehouse is done with fivetran. Deletions are caught properly with a deleted flag that is correctly ignored by the package. Everything runs smoothly except for some tests related to contacts being unique in the email_events aggregated table:
I did some investigation and it is due to contacts being merged in Hubspot. These contacts are not flagged as either deleted or inactive. Thus, there are duplicates in the email address (which is used in a JOIN clause and leads to duplicates downstream - specifically this SQL hubspot__email_event_sent.sql).
Expected behavior
The merged contact is the contact that is used in transformations and not the older contacts.