Closed d0m-1n-1c closed 2 years ago
Hi @d0m-1n-1c thanks so much for opening this issue!
We actually may have just what is needed to filter out those duplicate contacts! We found that HubSpot provides the functionality to "merge" contacts. However, we noticed this needs to take place in a downstream model. As such, we added a variables hubspot_contact_merge_audit_enabled
that will remove these duplicate contacts.
This variable is false
by default. If those duplicate contacts are in fact merged, then setting this variable to true
should resolve your test failures. The README should help detail what this variable does.
vars:
hubspot_contact_merge_audit_enabled: true
Let me know if this works. If it doesn't, we should do more investigating into why these duplicate records exist 👀
Hi @fivetran-joemarkiewicz thanks for getting back to me so quickly!
I completely glazed over that variable!!
I've implemented it and the number of duplicates has been reduced but the tests are still failing. It looks like there are two persisting issues:
contact_merge_audit
table with values in the vid_to_merge column.This means that the logic introduced with the variable isn't filtering them out
select
contacts.*
from contacts
left join contact_merge_audit
on contacts.contact_id = contact_merge_audit.vid_to_merge
where contact_merge_audit.vid_to_merge is null
Could this be an issue with the hubspot connector itself?
Hi @d0m-1n-1c,
After thinking about this one for a while, I am pretty sure the variable logic for filtering out merged contacts should capture all appropriate contacts. That being said, I am not entirely sure why there are still a few that are still slipping through the cracks.
I think this may be a better question geared for our Connector Support team. They will have a better idea on how to address this issue if it truly is at the connector level.
Hi @d0m-1n-1c
I am closing this issue as opening the support ticket would ideally have solved this issue. Please feel free to open this issue if you are still seeing the error.
Are you a current Fivetran customer? Dominic, Data Engineer at Rezdy
Describe the bug The macro
email_events_joined
has the below joinWe have duplicate emails in contacts (
stg_hubspot__contact
). These emails have different contact ids but the email address is the same. Although an edge case there doesn't appear to be a rule against itSteps to reproduce
stg_hubspot__contact
) in hubspothubspot__email_event_dropped
hubspot__email_event_opens
However the original email tablestg_hubspot__email_event
will pass on it's unique test.To get bad rows:
Expected behavior A join on contact id rather than the email address to produce a unique grain
Project variables configuration
Package Version
Warehouse
Additional context thats pretty much it ey
Screenshots
Please indicate the level of urgency Annoying but not critical, its creating a bunch of failing tests but we will likely fork the package and solve with a ROW_NUMBER() /QUALIFY to get the unique grain
Are you interested in contributing to this package?