fivetran / dbt_shopify_source

Fivetran's Shopify source dbt package
https://fivetran.github.io/dbt_shopify_source/
Apache License 2.0
30 stars 22 forks source link

[Feature] Adding new schema dynamically #88

Open advolut-team opened 3 months ago

advolut-team commented 3 months ago

Is there an existing issue for this?

Describe the issue

Hi Fivetran team,

First off, thank you for the amazing work in maintaining this repo - it has been very useful :)

Problem:

I'd like to run a DBT transformation for more than 50 Shopify shops using the same dbt_shopify repo, which uses this repo as a "source" package. I understand that this feature is already available, but every time a new Shopify store signs up via the Fivetran Connect Card flow, I'll need to append the shopify_union_schemas variable, which is already a long list.

So, I'd like to read the shopify_schema dynamically whenever a new Shopify store signs up to my website.

Attempted Workaround:

  1. For each Shopify store/connector, I have created a DBT project using dbt_shopify repo with an additional environment variable DBT_SCHEMA_NAME=shopify_store_name
  2. In my dbt_project.yml, I modified shopify_schema to read the environment variable DBT_SCHEMA_NAME, like below:
    vars: 
    shopify_database: dev
    shopify_schema: "{{ env_var('DBT_SCHEMA_NAME') }}"
  3. I found out that dbt only reads project variables literally, so jinja macros don't work (I'd love to be proven wrong on this).

Request:

Ideal: Modify dbt_packages/shopify_source/models/src_shopify.yml in your code. I'm happy to send a PR.

sources:
  - name: shopify # This source will only be used if you are using a single Shopify source connector. If multiple sources are being unioned, their tables will be directly referenced via adatper.get_relation
    schema: "{{ env_var('DBT_SCHEMA_NAME') or var('shopify_schema', 'shopify')  }}"

I'm also keen for any recommendations if you think my approach can be improved.

Relevant error log or model output

08:11:09  Running with dbt=1.7.17
08:11:09  Registered adapter: redshift=1.7.7
08:11:09  Unable to do partial parsing because a project config has changed
08:11:13  Found 92 models, 98 tests, 34 sources, 0 exposures, 0 metrics, 869 macros, 0 groups, 0 semantic models
08:11:13  
08:11:16  Concurrency: 1 threads (target='dev')
08:11:16  
08:11:16  1 of 81 START sql table model env_var2_shopify.shopify__calendar ............... [RUN]
08:11:18  1 of 81 OK created sql table model env_var2_shopify.shopify__calendar .......... [SUCCESS in 2.13s]
08:11:18  2 of 81 START sql view model env_var2_stg_shopify.stg_shopify__abandoned_checkout_discount_code_tmp  [RUN]
08:11:18  

Please be aware: The ABANDONED_CHECKOUT_DISCOUNT_CODE table was not found in your SHOPIFY schema(s). The Fivetran dbt package will create a completely empty ABANDONED_CHECKOUT_DISCOUNT_CODE staging model as to not break downstream transformations. To turn off these warnings, set the `fivetran__remove_empty_table_warnings` variable to TRUE (see https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source for details).

Expected behavior

Expected end result in Redshift:

Screenshot 2024-07-29 at 6 05 05 PM

Assuming the Shopify store is env_var2, this repo should generate env_var2_stg_shopify, and the dbt_shopify repo will generate env_var2_shopify. The env_var2_final schema is the "final" serving layer that has the derived tables from both repos. There will be 4 schemas in total.

dbt Project configurations


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_tutorial'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_tutorial'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

vars: # must add
    shopify_database: dev
    shopify_schema: dummy_value # input schema, overrides schema field in profiles.yml

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_tutorial:
    # Config indicated by + and applies to all files under models/example/
    example:
      +schema: final
      +materialized: view

Package versions

packages:
- package: fivetran/shopify
  version: [">=0.12.0", "<0.13.0"] 

What database are you using dbt with?

redshift

dbt Version

Core:
  - installed: 1.7.17
  - latest:    1.8.4  - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - redshift: 1.7.7  - Update available!
  - postgres: 1.7.17 - Update available!

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 3 months ago

@advolut-team thanks for opening this issue!

One quick adjustment, I am going to change this label from a bug to a feature request since this is a proposal for new functionality of the union schema feature.

Your request is something we have heard from other customers in the past (for example see this closed issue) and something we would be interested in adding onto the union schema feature. We had explored creating dynamic macros to query the information schema, but we continually ran into issues across warehouses and also with dbt interpreting jinja macros literally (as you highlighted) which resulted in us halting that exploration. That being said, we have not explored the environment variable you are proposing and this could be an interesting solution. 🤔

I'm a bit apprehensive with adding an environment variable to the package code as this is not something we have done in the past and would need to make sure this wouldn't conflict somehow with users dbt Core or Cloud environments. While I investigate that, would you be open to exploring a dbt source override and changing the schema to your proposed update and sharing if this works for your scenario? Additionally, you can create a fork and apply your changes and install your fork to test out the functionality. This will help us understand the viability of the proposed solution.

advolut-team commented 3 months ago

@fivetran-joemarkiewicz Thanks for the fantastic suggestion! So adding the following source override in my parent dbt_project.yml actually worked. I'll test this out with the Fivetran connect card flow tomorrow and give you another update :)

sources:
  - name: shopify
    overrides: shopify_source

    schema: "{{env_var('DBT_SCHEMA_NAME')}}"
advolut-team commented 3 months ago

Hey, I can confirm that the source override fix works. I am now able to add a new schema dynamically in Redshift when creating a connector with the Fivetran connect card. Happy for you to close this ticket! :) Thank you very much

fivetran-joemarkiewicz commented 2 months ago

That's great to hear and thanks for letting us know that this solution worked for you.

I'm going to keep this ticket open in case others using the package want something similar to be a built in feature of the package. If others come across this and would like to see this functionality, please comment and let us know.

Thanks again @advolut-team for opening this issue and collaborating with us on a solution that worked for you 😄