fivetran / dbt_hubspot_source

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot_source/
Apache License 2.0
31 stars 30 forks source link

[Feature] Exclude foreign key references for deleted records #87

Closed kylemcnair closed 1 year ago

kylemcnair commented 1 year ago

Is there an existing feature request for this?

Describe the Feature

Problem

Deleted records for deal, company, and ticket are excluded from their respective models, each with a final select of:

select *
from fields
where not coalesce(is_deleted, false)

However, ALL keys for these entities can appear in other models. The exclusion of deleted records isn't accounted for in other models, and can result in orphaned records.

For example, we had two companies merged into a single company, resulting in a deleted company record. Both companies shared the same deal_id. The deals_company table contains a record for each company, but only one company has a corresponding record in the hubspot_company table. This relationship resulted in a failing uniqueness test.


Solution

To maintain referential integrity, references to deleted entities should be removed from all models. ie, if a deleted company is excluded from the core company model, that deleted company_id should not appear in any other models.

This could be implemented via a macro that looks for foreign keys referencing any of the three models with deletes.

Some version of:

If this general approach makes sense, I can flesh out more specifics with the macro.

Describe alternatives you've considered

No response

Are you interested in contributing this feature?

Anything else?

original post in the #tools-fivetran channel in the dbt Slack

fivetran-joemarkiewicz commented 1 year ago

Hi @kylemcnair thanks so much for creating this feature request. I see the issue you are outlining and appreciate you thinking through a series of solutions.

My team and I are going to think through this a bit more and determine possible avenues forward here. Be sure to watch this thread as we will likely come back with follow up questions as we dig further into this.

fivetran-jamie commented 1 year ago

@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the is_entity_deleted columns downstream?

would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?

kylemcnair commented 1 year ago

@kylemcnair what are thoughts on your solution vs not deleting soft-deleted entities and persisting the is_entity_deleted columns downstream?

would you rather have no reporting on these deleted entities at all, or complete reporting with this is-deleted flag to pick and choose places to ignore them?

@fivetran-jamie I like your proposed solution - it leaves a more complete dataset and would be much simpler to implement than an elaborate macro.

kylemcnair commented 1 year ago

@fivetran-jamie should I run with this and open a PR? more than happy to, just lmk

fivetran-jamie commented 1 year ago

@kylemcnair yeah if the below plan sounds good/makes sense to you! i scoped it out a bit more to identify exactly where we need to make all these changes

Plan:

happy to help wherever ya need or explain anything further

fivetran-joemarkiewicz commented 1 year ago

Closing out this feature request as the above linked merged PRs address the request.