fivetran / dbt_hubspot

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot/
Apache License 2.0
33 stars 38 forks source link

bugfix/uniqueness-test-adjustment #94

Closed fivetran-joemarkiewicz closed 1 year ago

fivetran-joemarkiewicz commented 1 year ago

Are you a current Fivetran customer?

Fivetran created PR

What change(s) does this PR introduce?

Bug Fixes

Under the Hood

Did you update the CHANGELOG?

Does this PR introduce a breaking change?

This is simply an update to adjust the uniqueness tests following the release from v0.8.0. This will not be a breaking change.

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

Is this PR in response to a previously created Bug or Feature Request

How did you test the PR changes?

Select which warehouse(s) were used to test the PR

Provide an emoji that best describes your current mood

🦄 **Feedback** We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your [feedback](https://www.surveymonkey.com/r/DQ7K7WW) on our existing dbt packages or what you'd like to see next.
fivetran-joemarkiewicz commented 1 year ago

@fivetran-jamie FYI I also explored adding more rigorous testing instead of just removing the uniqueness tests. However, I was having a hard time rationalizing the need for a complicated test when I felt the removal would suffice. It was either add a complicated test that could be brittle and break along the way, or remove it and rely on the not_null test only.

Open to your thoughts.

fivetran-joemarkiewicz commented 1 year ago

so i think we need to adjust the tests for the email event models, not for models like contacts or deal_stages. i followed up in the issue to clarify which models are failing

i also think it would be best to include the is_contact_deleted fields in a dbt_utils.unique_combination_of_columns test rather than just removing the uniqueness tests

also also, i noticed that some _fivetran_synced entries have the _fivetran_deleted doc definition if you'd have bandwidth to fix those in here (or i can make an issue)

Thanks for following up here with these notes @fivetran-jamie! I definitely will have the bandwidth to make the adjustments on the docs definitions.

Additionally, I had one follow up regarding the addition of the tests. Would it technically work to simply include the deleted column to the uniqueness test? What if there are multiple deleted entries for a single contact? Wouldn't that still return a test failure as there could be five deleted entries and only one non deleted? I would assume that uniqueness test on multiple columns would fail.

If we wanted to truly test this, I think we would need to use a more robust test. Maybe something along the lines of the dbt-expectations expect_column_values_to_be_unique test to test on the condition where we filter out deleted records.

tests:
  - dbt_expectations.expect_column_values_to_be_unique:
      row_condition: "not is_deleted_field"

The inverse of this approach would be to just remove the uniqueness test altogether.

fivetran-jamie commented 1 year ago

@fivetran-joemarkiewicz ah i definitely like the expect_column_values_to_be_unique approach, since i suppose in theory a contact could be deleted multiple times 🤔 and the test reflects our actual thinking more appropriately

fivetran-joemarkiewicz commented 1 year ago

Just applied the updates and the users shared that the test updates fixed the issues on their end (see the issue for details).

This is now ready for re-review. I will plan to regen docs following your review/approval.