fivetran / dbt_shopify

Fivetran's Shopify dbt package
https://fivetran.github.io/dbt_shopify/
Apache License 2.0
54 stars 40 forks source link

[Bug] Incremental Load in `shopify__customer_cohorts` has NULL fields #85

Closed advolut-team closed 1 month ago

advolut-team commented 1 month ago

Is there an existing issue for this?

Describe the issue

Hi Fivetran folks,

Firstly, thank you for your hard work in maintaining this repository, it has helped me tremendously.

For context, I'm using Fivetran and to extract Shopify data, transform it with this repo, and load it into AWS Redshift connector. I started using this connector since last week and I only have data for July and August 2024

The problem I'm facing is in the table shopify__customer_cohorts, the lifetime columns total_price_lifetime, order_count_lifetime, line_item_count_lifetime and cohort_month_number are having NULL values. The image below shows the NULLs:

Screenshot 2024-08-09 at 9 38 38 PM

I believe this problem occurs because the incremental load CTE -- backfill_lifetime_sums -- gives NULL value. Here is the complied dbt script of the CTE:

backfill_lifetime_sums as (
    -- for incremental runs we need to fetch the prior lifetimes to properly continue adding to them
    select source_relation,
        customer_id,
        max(total_price_lifetime) as previous_total_price_lifetime,
        max(order_count_lifetime) as previous_order_count_lifetime,
        max(line_item_count_lifetime) as previous_line_item_count_lifetime,
        max(cohort_month_number) as previous_cohort_month_number
    from "dev"."env_var2_shopify"."shopify__customer_cohorts"
    where date_month < '2024-08-01'
    group by 1,
        2
),

I have identified a potential fix, which is to add a COALESCE function in these lines, like below:

...
coalesce(backfill_lifetime_sums.previous_cohort_month_number, 0) + windows.cohort_month_number as cohort_month_number,
coalesce(backfill_lifetime_sums.previous_total_price_lifetime,0) + windows.total_price_lifetime as total_price_lifetime,
coalesce(backfill_lifetime_sums.previous_order_count_lifetime,0)+ windows.order_count_lifetime as order_count_lifetime,
coalesce(backfill_lifetime_sums.previous_line_item_count_lifetime,0) + windows.line_item_count_lifetime as line_item_count_lifetime,
...

I'm interested to hear your perspective. If you think my potential fix is okay, could I raise a PR? Thanks!

Relevant error log or model output

No response

Expected behavior

The lifetime columns should have a value instead of NULL, as shown below:

Screenshot 2024-08-09 at 9 42 42 PM

I got this expected result after adding the COALESCE function as mentioned above

dbt Project configurations


# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_tutorial'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_tutorial'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

vars: # must add
    shopify_database: dev
    shopify_schema: env_var2

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dbt_tutorial:
    # Config indicated by + and applies to all files under models/example/
    example:
      +schema: final
      +materialized: view

Package versions

packages:
- package: fivetran/shopify
  version: [">=0.12.0", "<0.13.0"] 

What database are you using dbt with?

redshift

dbt Version

Core:
  - installed: 1.7.17
  - latest:    1.8.5  - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - redshift: 1.7.7  - Update available!
  - postgres: 1.7.17 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-reneeli commented 1 month ago

Hi @advolut-team ! Thank you for the kind words and putting this together so thoughtfully!

Our team understands the issue and we agree that your solution should do the trick. Feel free to open the PR against our repo!

fivetran-avinash commented 1 month ago

Hi @advolut-team ! Took a first look at your PR and it looks mostly good.

I had one question and a few requests before merging your PR. Please let me know if you have any questions.

fivetran-avinash commented 1 month ago

Hi @advolut-team ! your test should now be live in the latest version of the shopify package.

Let us know if you have any questions or issues! Thanks again for putting this PR together.