fivetran / dbt_stripe

Data models for Stripe built using dbt.
https://fivetran.github.io/dbt_stripe/
Apache License 2.0
29 stars 33 forks source link

BUG - subscription_id should be leveraged from Invoice, not Invoice_Lines #25

Closed fivetran-joemarkiewicz closed 1 year ago

fivetran-joemarkiewicz commented 2 years ago

Are you a current Fivetran customer?

Fivetran created Bug

Describe the bug

A customer noted in a dbt Slack thread that the subscription_id we leverage within the stripe__subscription_details model is not completely accurate and we should instead use the subscription_id within the Invoice table. After looking through the Connector Release Notes it is clear that the subscription_id has been added to the Invoice table and should be leveraged here instead of the Invoice_Line table.

When the subscription_id from the Invoice_Line table is used, there leaves room for empty aggregates in the final models. This update will resolve that. Additionally, the subscription_id is not included in the staging model for Invoice. So an update to the source package will be needed as well.

Expected behavior

There are no empty aggregates when a customer runs the package and inspects the final stripe__subscription_details model.

ryan-loveland commented 2 years ago

🤔 I have been looking over some of the SQL templates provided by Stripe-Sigma and see that when looking at subscription invoices and invoice line items they are applying the following filter (invoice_line_items.source_type = 'subscription' or invoice_line_items.subscription is not null). I see the the package uses just the invoice_line_item.subscription_id is not null in the stripe__subscription_line_items model...

When looking at my data I am seeing some inconsistencies in whether there is a subscription_id at the invoice or invoice_line_item level as well as the usage of type.

At this point I am not sure what would be the correct value to join, filter on. My current thought is there may be a reason why it is on the invoice (main line) and not on the invoice line item (line item) or vice versa in cases of multiple line items on an invoice having their own relationship with a subscription - granted I am not sure this is even possible through Stripe - I just have seen similar structures within finance systems. Stripe including the type only makes this more confusing though.

fivetran-joemarkiewicz commented 2 years ago

This is really interesting 🤔

I wouldn't have thought that the invoice_line_items.source_type = 'subscription' could result in null subscription_ids. Let me check with the Stripe connector PM if they have any insight into this and if they can help provide any direction on this join.

Thanks!

fivetran-joemarkiewicz commented 2 years ago

I just wanted to provide an update here that I have not been able to get a strong understanding around the best path forward with this open issue. The limitation of not having production data is seemingly one of the largest blockers for testing out the behaviors.

If anyone else that comes along to this thread has some insight or commentary to provide, we would extremely welcome it! In the meantime I will keep investigating.

ryan-loveland commented 2 years ago

I did some additional testing against our production data.

First, I validated data within Stripe alone to check differences between the subscription_id between the two tables in question:

with invoice_subs as (

    select
      invoices.id as invoice_id,
      invoices.date as invoice_created_at,
      invoices.status as invoice_status,
      invoices.subscription_id,
      invoices.total / 100.00 as invoice_total,
      invoice_line_items.source_type,
      invoice_line_items.proration as is_proration,
      invoice_line_items.subscription as invoice_line_item_sub_id,
      invoice_line_items.amount / 100.00 as invoice_line_item_amount,
      case when invoices.subscription_id = invoice_line_items.subscription
        then true
      else false end as sub_matches,
      prices.id as price_id,
      prices.recurring_interval,
      products.id as product_id,
      products.active as is_active

    from invoices

    join invoice_line_items 
      on invoices.id = invoice_line_items.invoice_id
    left join prices
      on invoice_line_items.price_id = prices.id
    left join products
      on prices.product_id = products.id

  )

  select * from invoice_subs where sub_matches = false

I exported the data and reviewed several scenarios and pulled up invoices in the UI to confirm my findings. See below:

The next step is checking against the data being synced to our data warehouse which when running a similar query gave far more results which leads me to believe there are changes happening on the invoice and/or the line items that isn't syncing or there is a delay. More to come.

Update: When running the same query against the data warehouse:

fivetran-jamie commented 1 year ago

hey @d3ad-pix3l, picking this up again! were there any cases in which

invoices.subscription_id != invoice_line_items.subscription_id 
and invoices.subscription_id is not null 
and invoice_line_items.subscription_id is not null

and could you explain the difference in the two environments you ran that query in? is the data warehouse your Fivetran-piped Stripe data?

ryan-loveland commented 1 year ago

@fivetran-jamie I was testing the data in the data warehouse from which Fivetran piped in Stripe data and using the built-in query writer in Stripe via Sigma. I am no longer working on this project however, so I cannot provide any additional information beyond what I provided previously.

fivetran-reneeli commented 1 year ago

Subscription_id is now brought into the invoice staging model. While it is brought into the downstream invoice_details model via the subscription object, it is being joined to the invoice via invoice.subscription_id.