fivetran / dbt_stripe

Data models for Stripe built using dbt.
https://fivetran.github.io/dbt_stripe/
Apache License 2.0
30 stars 31 forks source link

[Investigation] Possible fanout in `int_stripe__account_daily` when using connected accounts #80

Open fivetran-catfritz opened 2 months ago

fivetran-catfritz commented 2 months ago

What to investigate

This was discovered from an investigation from error query exceeded resource limits in int_stripe__account_daily.

Upon review, this join does not join on some sort of account_id or connected_account_id. For typical cases where only one account is in use, this is no issue, however if a user is using connected accounts, this would cause a fanout since the same balance_transaction would be repeated for every account. Also, this fanout could be what is causing resource issues.

Because of internal data limitations, there is uncertainty on how to correctly address this. Ideally we want to review data from a user with connected accounts. (If that's you and would like to help us, please let us know in this thread!)

Possible Solution

For model int_stripe__account_daily, in my initial investigation I thought to update the CTE daily_account_balance_transactions with a filter like:

...
    from date_spine
    left join balance_transaction
        on cast(balance_transaction.date as date) = date_spine.date_day
        and balance_transaction.source_relation = date_spine.source_relation
        and balance_transaction.connected_account_id = 
            case when balance_transaction.connected_account_id is not null
                then date_spine.account_id
                else null end -- necessary for cases where an account is not a connected account. We don't want to erroneously filter transactions out.
    group by 1,2,3

However the issue is still that we don't have appropriate data to test if this is accurate. Just posting here, so it isn't lost.