fivetran / dbt_shopify_source

Fivetran's Shopify source dbt package
https://fivetran.github.io/dbt_shopify_source/
Apache License 2.0
29 stars 22 forks source link

some metafield tables can break if the "key" column has the same word in uppercase and lowercase. #64

Closed ZCrookston closed 1 year ago

ZCrookston commented 1 year ago

Is there an existing issue for this?

Describe the issue

If the key field has two values that have different casing (upper case and lower case) then it causes an error during the shopify__product_stage portion of a dbt run.

This error is resolved by adding a sql lower function to the metafield_reference field on line 41 of stg_shopify__metafield.sql

    lower({{ dbt.concat(["namespace","'_'","key"]) }}) as metafield_reference,

Relevant error log or model output

19:50:56  BigQuery adapter: Retry attempt 1 of 1 after error: BadRequest('Name custom_downloads is ambiguous inside lookup_object at [1516:33]')

--Essentially, because the key of "downloads" has inconsistent capitalizatin, the case statement is creating two different statements for custom_downloads and custom_Downloads, and subsequently attempting to create two columns with the same name, but different casing. This causes the later ambiguous column error.

Expected behavior

This portion of the package, specifically a column name, should not be case sensitive.

dbt Project configurations

vars:

shopify_database: kanopibyarmstrong shopify_schema: shopify shopify_using_fulfillment_event: false # false by default shopify_timezone: "America/New_York" shopify_using_product_metafields: True

Package versions

packages:

What database are you using dbt with?

bigquery

dbt Version

1.3

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 1 year ago

Hi @ZCrookston thanks for opening this issue and taking a first pass at opening the PR to address the issue. Your PR looks good and when testing I don't see any issues, but I want to understand the issue a little bit better. I just attempted this on my sample data (using mixed casing) and didn't run into the same issue.

I think the missing piece on my end is the key having two values. Would you be able to share an example of this occurring? I want to make sure there aren't any other missing components that we maybe need to factor into this update. Thanks!

FridayPush commented 1 year ago

I think using lower liberally with Metafields may be required. We're hitting issues where the OWNER_RESOURCE value can come in full caps and lowercase which breaks our workflows. In our scenario the following appears:

The 'owner_resource' can come in 'CUSTOMER' and 'customer'. So if we have a value 'vip' it may show up as.

owner_resource, namespace, key, value
CUSTOMER, cohort, vip, true
customer, cohort, vip, false

Both are set by Shopify Apps and it appears it requires users to maintain case, and their separate apps aren't consistent.

fivetran-joemarkiewicz commented 1 year ago

Hi @FridayPush thanks for jumping in here and sharing your experience. I believe using lower across the board for this feature may be the option we need to go with. Before moving forward I do want to confirm that both CUSTOMER and customer in this case are in fact the same and not distinct values? I have seen some cases in different platforms that casing does not mean the values are the same, but should be treated as unique values.

Thanks!

FridayPush commented 1 year ago

I don't see any indication that Shopify uses the case to represent different concepts. For our scenario we do have exactly the same details updated from different apps that have different cases. We aren't using the vip example but tagging identities from other systems to align the user across. So I can do a query like, pseudo

select id
  metafield.value as lineage_id
from stg_shopify__customer customer
left join stg_shopify__metafield metafield
  on customer.id = metafield.owner_resource_id 
  and metafield.owner_resource = 'customer'
  and key='lineage_id'

That only returns some, and using CUSTOMER returns others.

  and lower(metafield.owner_resource) = 'customer' 

Returns the correct total count.

fivetran-joemarkiewicz commented 1 year ago

@FridayPush thanks for sharing your findings here. With this, I feel my team is ready to tackle addressing this issue. I have folded this into our upcoming sprint. Be sure to follow this issue for more updates as we begin working on this resolution.

fivetran-avinash commented 1 year ago

Hi @ZCrookston and @FridayPush! Thanks for your contributions in getting this issue resolved. We are on the verge of a fix.

If you'd like, you can test this before we merge this onto our main branch. Here is the git config you'll need to use.

packages:
  - git: https://github.com/fivetran/dbt_shopify_source.git
    revision: bugfix/metafield-casing
    warn-unpinned: false 

We are expecting to merge this by the end of our sprint next week!