fivetran / dbt_mixpanel

Fivetran's Mixpanel dbt package
https://fivetran.github.io/dbt_mixpanel/
Apache License 2.0
6 stars 10 forks source link

[Feature] union_schemas support #52

Open bdtoole opened 2 days ago

bdtoole commented 2 days ago

Is there an existing feature request for this?

Describe the Feature

We have multiple Fivetran connectors for mixpanel that ultimately result in multiple source schemas for our mixpanel data. This package currently does not support the union_schemas functionality, which means if I want to use the package for our mixpanel data, I can only use it for one schema and I'd have to manually define the second schema and model the data myself.

How would you implement this feature?

The feature has effectively been built out already - it just hasn't been applied to this package, so the necessary changes would need to be made to support it here.

Describe alternatives you've considered

No response

Are you interested in contributing this feature?

Anything else?

For reference, here's a Slack conversation I had with @fivetran-joemarkiewicz about this.

fivetran-joemarkiewicz commented 2 days ago

Thanks for opening this Feature Request @bdtoole!

As discussed in the slack conversation you linked above, this is something that should be relatively straightforward to add to the package. However, this is not something our team would likely pick up in the short term unless we see increased community interest in this PR. That being said, since it seems you're open to creating a PR I would be happy to share the steps necessary to provide union_schema support for this package. You can then contribute the changes in a PR which our team at Fivetran would be able to review and merge into the next release in a quicker turnaround.

If that is of interest to you, or anyone else who comes across this thread and is interested in contributing the union schema feature then you can view the required steps below:

  1. Create a models/tmp folder which will include a single model titled stg_mixpanel__event_tmp.
  2. Inside this model you will add code which reflects the following structure. Essentially the contents would be something along the lines of:
    {{
    fivetran_utils.union_data(
        table_identifier='event', 
        database_variable='mixpanel_database', 
        schema_variable='mixpanel_schema', 
        default_database=target.database,
        default_schema='mixpanel',
        default_variable='event',
        union_schema_variable='mixpanel_union_schemas',
        union_database_variable='mixpanel_union_databases'
    )
    }}
  3. You will replace this line with a star reference to the new tmp model instead of the source table since the new tmp model will include the unioned data.
  4. In this section of the staging model you will now add this macro to persist the source_relation field. For Mixpanel the code will look something similar to the following.
        {{ fivetran_utils.source_relation(
            union_schema_variable='mixpanel_union_schemas', 
            union_database_variable='mixpanel_union_databases') 
        }}
  5. Now that the source_relation field is persisted from the above macro, you will need to add that field to every downstream model reference. For example, you will need to add source_relation as a new field here and anywhere else we explicitly select fields in the models.
  6. Lastly, wherever there is a join you must add the source_relation as a condition. So here you will need to include the following to ensure joins are only performed on the relevant data:
    join user_first_events
        on spine.date_day >= user_first_events.first_event_day -- each user-event_type will a record for every day since their first day
        and spine.source_relation = user_first_events.source_relation

The above steps should be all that's required. There are additional changes and some improvements we need to apply as well, but our team can handle that during the PR review if you choose to open one. Let me know if you're interested and if you have any other questions. Thanks!