fivetran / dbt_shopify

Fivetran's Shopify dbt package
https://fivetran.github.io/dbt_shopify/
Apache License 2.0
52 stars 40 forks source link

[Feature] Shopify customers to be based on email address rather than customer ID #37

Closed danieltaft closed 1 year ago

danieltaft commented 2 years ago

Is there an existing feature request for this?

Describe the Feature

The Fivetran Shopify models for shopifycustomers and shopifycustomer_cohorts are very helpful packages. However they are based entirely around the Shopify customer_id.

In Shopify, when a customer checks out as a guest, a new customer_id is created every time, even if the customer uses the same email address.

It would be a great improvement in the accuracy of these models to base the models on customer email (or hashed email address if that is used) rather than customer id. This will improve the accuracy of new vs returning customers as well as lifetime value etc.

Describe alternatives you've considered

No response

Are you interested in contributing this feature?

Anything else?

No response

danieltaft commented 2 years ago

Slack thread: https://getdbt.slack.com/archives/CU4MRJ7QB/p1660088316967609

jankatins commented 2 years ago

While I would support this, I just want to mention that merging users based on emails can have interesting implications. In some case it could be really bad to e.g. base product recommendations or newsletter content on such merged users...

E.g. it might be advisable to run this twice so you have some merged data for analytics and some unmerged ones for pushing it back to customers...

danieltaft commented 2 years ago

While those first two points are true, I think they are edge cases behind the main scenario, which is that on a regular store a customer can check out many times over a year as a guest. So with a separate id each time the number of new customers, returning customers, and lifetime value are all totally wrong. This is way more common on just regular stores than an occasional shared email address or change of email address which is rare and immaterial in overall analysis.

I disagree with anonymity though and don't think it has anything to do with these statistics.

fivetran-jamie commented 2 years ago

I agree with you @danieltaft, but I am curious about the scenario in which you might want to have both a merged and unmerged version... @jankatins what would "pushing it back to customers" mean?

also we do roll up customers to the email-grain here if y'all have any opinions on how this merge should happen

danieltaft commented 2 years ago

Hi @fivetran-jamie

The only reason really that I thought of supporting both/either customer_id or email is to allow backwards compatibility for those who haven't sync'd the email address column. But if it adds a lot of complexity and doesn't add much value, it's not necessary.

In terms of identifying a customer for the purposes of all aggregate customer-based reports, I can't see any scenario overall where Shopify's customer_id is a better identifier of a person than email.

pkanter commented 1 year ago

This would be a welcome change. For companies like mine, that have migrated to Shopify after years on another platform, the only commonality between the applications' customers would be the emails. Customer IDs in each would not be related to each other.

For doing a cohort analysis and customer can be new in shopify this month, but has actually been a customer of ours for years. This could only be caught by email address not customer ID.

fivetran-jamie commented 1 year ago

this is in the new release! there are the shopify__customer_emails and shopify__customer_email_cohorts models, which are email based versions of the same models