fivetran / dbt_shopify_source

Fivetran's Shopify source dbt package
https://fivetran.github.io/dbt_shopify_source/
Apache License 2.0
29 stars 23 forks source link

Bug/union source tables #59

Closed fivetran-jamie closed 1 year ago

fivetran-jamie commented 1 year ago

Are you a current Fivetran customer?

internal

What change(s) does this PR introduce?

Did you update the CHANGELOG?

Does this PR introduce a breaking change?

Did you update the dbt_project.yml files with the version upgrade (please leverage standard semantic versioning)? (In both your main project and integration_tests)

Is this PR in response to a previously created Bug or Feature Request

How did you test the PR changes?

Select which warehouse(s) were used to test the PR

Provide an emoji that best describes your current mood

💒 (union joke lol) **Feedback** We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your [feedback](https://www.surveymonkey.com/r/DQ7K7WW) on our existing dbt packages or what you'd like to see next.
fivetran-jamie commented 1 year ago

@fivetran-jamie thanks for working through this! Overall this is looking great and I added a few comments and suggestions to update within the package.

There is one issue that seems to occur that I am unsure how to solve. Your solution works great and operates as intended when the table does not exist in all sources. However, I found that the staging model not_null tests fail when this happens 🤔. I am not sure if this is something we encountered before, but wondering what your thoughts are on how to address this? image

great catch -- i switched the not_null test to https://github.com/calogica/dbt-expectations#expect_column_values_to_not_be_null (so when source_relation is not null, we run the test. this field should only be null if the table isn't found -- otherwise it is the schema/db or an empty string)

however, if someone doesn't have the discount source table and runs the transform package, they would get a test failure on the shopify__discounts end model, as this also has a not_null test. this gives us two options:

  1. let the customer get the error. if so, they'll need to manually disable the discount end model. pro: if they don't have discount data, they shouldn't have discount end models. con: if they suddenly do start using discounts, they'll have to manually enable this again
  2. apply the same conditional not_null tests in the transform package. pro: consistent, con: having to release new version of the package for something that can be done on customer's end

what do you think @fivetran-joemarkiewicz ?

a more paradigm-shifting alternative.... add a limit 0 to the all-null staging models we're creating.

fivetran-joemarkiewicz commented 1 year ago

@fivetran-jamie thanks for the detailed response and thorough investigation of alternatives!

My only concern with the current implementation (adding the additional tests to the staging models) is if this will be needed to be applied to all other packages using the union feature? Or is this isolated only to these types of models that do not leverage variables and instead rely on the null filling if they do not exist? I wonder if we have other packages that leverage this combo (maybe stripe?).

Back to your original question - Between your two suggestions I am actually leaning more towards option 2 as this will ensure we are not having the customer experience an error and then required them to troubleshoot. We should try our best to make sure the packages work without error initially. Especially in the future state of quickstart, we may not have the luxury of having the customers define variables manually.

However, I kind of like your paradigm shifting thought of doing a limit 0 (seems similar to how our internal dry run works). I think this would mean we would not need to adjust downstream tests, but I worry this may cause issues with joins 🤔. In a perfect world we could get this to work by just turning off the tests based on the result of the tmp model. I don't think this is possible, but if you have an idea that could possibly get this to work then I am game to move forward with that! Regardless, I am open to thinking outside the box for this solution as it is multi-faceted.

In the end, I really agree with @fivetran-sheringuyen observation from her fivetran_utils review that we should raise a compiler warning to let customers know that we are doing something a bit different (creating empty tables, limit 0'ing, or anything else out of the norm) as I do believe it should be our responsibility to fill customers in on if we are doing something out of the norm, rather than them finding it down the line and being puzzled at the behavior.

Happy to chat more about this in person if you would like!