fivetran / dbt_shopify_source

Fivetran's Shopify source dbt package
https://fivetran.github.io/dbt_shopify_source/
Apache License 2.0
29 stars 23 forks source link

[Bug] table_exists fails with union variables #57

Closed dfagnan closed 1 year ago

dfagnan commented 1 year ago

Is there an existing issue for this?

Describe the issue

I believe that table_exists check (e.g. for discount codes here) does not work correctly when using a union variable with two stores. alternatively - it may not work if one store has the table and another doesn't since I'm not sure I understand the exact flow.

Relevant error log or model output

The model returns the null row instead of the union of the correct data causing data loss.

Expected behavior

It should run the union macro when at least one of the table exist

dbt Project configurations

shopify_union_schemas: ['SHOPIFY_STORE_ONE','SHOPIFY_STORE_TWO']

Package versions

What database are you using dbt with?

snowflake

dbt Version

Core:

Plugins:

Additional Context

No response

Are you willing to open a PR to help address this issue?

dfagnan commented 1 year ago

This bug might be impacting more tables & views - likely these four: https://github.com/fivetran/dbt_shopify_source/search?q=table_exists

fivetran-joemarkiewicz commented 1 year ago

Hi @dfagnan thank you so much for raising this issue. My team is currently looking into what may be going on here and will post back once we understand the core of the issue a bit better.

fivetran-joemarkiewicz commented 1 year ago

@dfagnan I was just able to confirm that by removing the conditionals in the relevant models (see here for an example) you will be able to get the intended results as long as at least one schema has the relevant source table. This approach seems to fall apart once no schemas have the source table.

-- Conditionals removed in this tmp model example

{{
    fivetran_utils.union_data(
        table_identifier='discount_code', 
        database_variable='shopify_database', 
        schema_variable='shopify_schema', 
        default_database=target.database,
        default_schema='shopify',
        default_variable='discount_code_source',
        union_schema_variable='shopify_union_schemas',
        union_database_variable='shopify_union_databases'
    )
}}

I believe this is something we can address with a nifty for loop to iterate through the schemas and provide a result if no schemas have the relevant source. This will take some time for us to design, but I will post back here once I am able to make some progress!

dfagnan commented 1 year ago

@jonatfivetran the for loop approach feels intuitive to me - thanks for the update!

fivetran-joemarkiewicz commented 1 year ago

@dfagnan I wanted to post back to let you know that I was able to get a working workaround in place within the bugfix/union-table-exists branch on this (and the dbt_shopify) repo. Albeit after chatting with @fivetran-jamie we believe there is a more dynamic way we can achieve the intended result. As such, we will be spending more time in our upcoming sprint to investigate the dynamic approach.

Currently, the solution in the branch I mentioned should work for the union feature as long as at least one schema has the required table. If one schema has the table and a subsequent one does not, the fivetran_utils.union_data() macro will handle the abscence.

If you have a moment, would you be able to try the following version of the package:

Note: This same branch exists on the transform package. If you wanted to test the entire source + transform then you can swap the git repo to be dbt_shopify instead of dbt_shopify_source

packages:
- git: https://github.com/fivetran/dbt_shopify_source.git
revision: bugfix/union-table-exists
warn-unpinned: false 

Let me know if this works for you! Additionally, stay tuned for a week or so from now where we investigate a more dynamic approach.

dfagnan commented 1 year ago

Looks great @fivetran-joemarkiewicz - thanks for the quick turnaround!

Your note on the branch was also super helpful - I tried hacking my own fork but should've thought of also overriding the dbt_shopify package.

fivetran-jamie commented 1 year ago

working on a more sustainable solution here, but also realized there's a slightly simpler temporary 🩹 method that doesn't involve a working branch or fork. this worked for another customer at least:

go to your dbt_project.yml, and add the following variable (leave the union_schemas variable in there)

shopify_schema: 'SHOPIFY_STORE_ONE' # choose schema that has the potentially-missing table, if there is one, otherwise any whichever schema works

realized that the does_table_exist check looks at the defined source, whose schema is configured by the shopify_schema variable instead of shopify_union_schemas

dfagnan commented 1 year ago

Nice @fivetran-jamie !

fivetran-jamie commented 1 year ago

hey @dfagnan we've developed a pretty neat and dynamic solution if you'd like to try it out before we release next week!

packages:
  - git: https://github.com/fivetran/dbt_shopify_source.git
    revision: bugfix/union-source-tables
    warn-unpinned: false 

Note: this branch points to a working branch of fivetran_utils, so to test, you'll have to comment out any other fivetran packages from your packages.yml

fivetran-jamie commented 1 year ago

hey hey -- this fix is now live! (v0.8.2 of shopify_source)

if you are referencing the latest version of the shopify transform package, a dbt deps should automatically pick up the source package changes