fivetran / dbt_shopify_source

Fivetran's Shopify source dbt package
https://fivetran.github.io/dbt_shopify_source/
Apache License 2.0
29 stars 23 forks source link

[Bug] Update partitioning logic to account for `source_relation`, empty source tables and union data #77

Closed fivetran-catfritz closed 6 months ago

fivetran-catfritz commented 7 months ago

Is there an existing issue for this?

Describe the issue

Sub-issue 1: Add source_relation to all partition by clauses

For models stg_shopify__metafield and stg_shopify__abandoned_checkout_discount_code, the column source_relation should be added to the partition by clauses used to determine is_most_recent_record.

Same update also needs to be added to line 5 of int_shopify__customer_email_rollup in the transform.

Below update from @fivetran-avinash Sub-issue 2: Identify best approach to handle empty source models for Redshift

In Redshift, an empty source model that contains a partitioning clause will fail at the staging layer with the constant expressions are not supported in partition by clauses error. One customer reported this error arising from the stg_shopify__abandoned_checkout_discount_code model. This particularly comes into play when unioning schemas and databases.

There are several approaches that could be taken, each which have pros and cons. See the comments for more details on the options.

The path forward we chose will be hard coding a case when statement for the situation where a table is empty.

Example code below:


  case when checkout_id is null and code is null and index is null
    then row_number() over(partition by source_relation) as index
    else row_number() over(partition by checkout_id, upper(code), source_relation order by index desc)
  end as index

Relevant error log or model output

No response

Expected behavior

is_most_recent_record will produce correct flag when used with unioned sources. Null tables will not generate partitioning errors.

dbt Project configurations

n/a

Package versions

v0.10.0

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-avinash commented 6 months ago

Options for sub-issue 2, path TBD:

Example code below:

  case when checkout_id is null and code is null and index is null
    then row_number() over(partition by source_relation) as index
    else row_number() over(partition by checkout_id, upper(code), source_relation order by index desc)
  end as index

Example code below: row_number() over(partition by coalesce(checkout_id,-9999), coalesce(upper(code),'empty'), source_relation order by coalesce(index,-9999) desc) as index

fivetran-avinash commented 6 months ago

After much deliberation, Option 3 it is! Will update the ticket shortly.