Closed kylemcnair closed 1 year ago
Hi @kylemcnair thanks so much for creating this feature request. I see the issue you are outlining and appreciate you thinking through a series of solutions.
My team and I are going to think through this a bit more and determine possible avenues forward here. Be sure to watch this thread as we will likely come back with follow up questions as we dig further into this.
@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the is_entity_deleted
columns downstream?
would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?
@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the
is_entity_deleted
columns downstream?would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?
@fivetran-jamie I like your proposed solution - it leaves a more complete dataset and would be much simpler to implement than an elaborate macro.
@fivetran-jamie should I run with this and open a PR? more than happy to, just lmk
@kylemcnair yeah if the below plan sounds good/makes sense to you! i scoped it out a bit more to identify exactly where we need to make all these changes
Plan:
is_deleted
or _fivetran_deleted
field in the final CTE of the model
is_<foreign key table>_deleted
flag where we are joining entities with their potentially deleted foreign key references
Email_events_joined
macroInt_hubspot__email_metrics__by_contact_list
Int_hubspot__deals_enhanced
Hubspot__deal_stages
happy to help wherever ya need or explain anything further
Closing out this feature request as the above linked merged PRs address the request.
Is there an existing feature request for this?
Describe the Feature
Problem
Deleted records for
deal
,company
, andticket
are excluded from their respective models, each with a final select of:However, ALL keys for these entities can appear in other models. The exclusion of deleted records isn't accounted for in other models, and can result in orphaned records.
For example, we had two companies merged into a single company, resulting in a deleted company record. Both companies shared the same deal_id. The deals_company table contains a record for each company, but only one company has a corresponding record in the
hubspot_company
table. This relationship resulted in a failing uniqueness test.Solution
To maintain referential integrity, references to deleted entities should be removed from all models. ie, if a deleted company is excluded from the core
company
model, that deleted company_id should not appear in any other models.This could be implemented via a macro that looks for foreign keys referencing any of the three models with deletes.
Some version of:
If this general approach makes sense, I can flesh out more specifics with the macro.
Describe alternatives you've considered
No response
Are you interested in contributing this feature?
Anything else?
original post in the #tools-fivetran channel in the dbt Slack