fivetran / dbt_hubspot_source

Data models for Hubspot built using dbt.
https://fivetran.github.io/dbt_hubspot_source/
Apache License 2.0
31 stars 30 forks source link

parametrize existing source freshness tests #81

Closed moreaupascal56 closed 1 year ago

moreaupascal56 commented 2 years ago

Pull Request Are you a current Fivetran customer? yes yousign What change(s) does this PR introduce?

This PR allows the parametrization of freshness tests done in the hubspot source package. Indeed in the current version the tests are done for 84/168 hours (resp. warning, error). I needed a smaller timelapse for my tests.

I propose to use a variable to set the number of hour needed. I was thinking at first on a unique number of hour for all tables, but I thought that maybe some tables do not need the same freshness (ex. contact is updated regularly but some tables might be less updated), therefore I added 2 couples of variables:

Warnings: hubspot_freshness_warn_NAME_OF_THE_TABLE and hubspot_freshness_warn Errors: hubspot_freshness_error_NAME_OF_THE_TABLE and hubspot_freshness_error

With this structure we can set a default for all tables and if a table needs some special freshness test it is as well possible, I kept the 84/168 value for absolute default. This give a not very pretty syntax of var(xxx, var()) but I do not see how to do it more gracefully.

Please give me your opinion :) I did for now the change only on tables where freshness was implemented but I'd love to have it on others as well, (I can add later, after getting your review).

Did you update the CHANGELOG?

Does this PR introduce a breaking change?

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

Is this PR in response to a previously created Bug or Feature Request

How did you test the PR changes?

Select which warehouse(s) were used to test the PR

Provide an emoji that best describes your current mood

🧝‍♂️ **Feedback** We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your [feedback](https://www.surveymonkey.com/r/DQ7K7WW) on our existing dbt packages or what you'd like to see next.
fivetran-joemarkiewicz commented 1 year ago

Hi @moreaupascal56 thanks so much for opening this PR! This is something we have actually seen a few other users call out in a number of our other packages.

Typically our go to response is to encourage users to leverage the dbt overrides functionality for freshness. This allows users to take the same structure we have outline in this package, but modify certain overridable components within the source. This would allow you to set the freshness tests to be specific to your use case!

In this situation, you would create a src_hubspot.yml in your root project. You can then do a quick copy/paste of our src_hubspot.yml file. The only addition you would need to make is that of the override config:

## In your root project (not the package)
## models/src_hubspot.yml

version: 2

sources:
  - name: hubspot
    overrides: hubspot_source
    tables:
      - name: contact
        identifier: "{{ var('hubspot_contact_identifier', 'contact')}}"
        freshness:
          warn_after: {
            count: [your custom value],
            period: hour
          }
          error_after: {
            count: [your custom value],
            period: hour
          }

The above is just for the contact table, but could be reproduced for any other sources defined in the package.

I am a bit wary of adding more variables to this package (and this package already has too many 🤯). Therefore, I feel this override is a good alternative. What are your thoughts? Are there pieces of the override that wouldn't work for your use case? I am curious to weight the pros and cons of using variables in this case over the dbt override functionality. Looking forward to chatting more on this!

moreaupascal56 commented 1 year ago

Hey @fivetran-joemarkiewicz, yes I agree with you that's way simpler, I did not saw the obvious solution ahha. But just one question: why are we keeping the default freshness tests of 84 and 168 hours in the package then ?