bcodell / dbt-activity-schema

A dbt package with a POC implementation of an interface to query activity streams that adhere to the Activity Schema 2.0 spec.
Apache License 2.0
12 stars 2 forks source link

feature - activity columns should consider both customer_id and anonymous_customer_id #34

Closed awoehrl closed 8 months ago

awoehrl commented 9 months ago

Similiar to dataset joins the activity columns should be calculated based on both id columns. The two window functions can be extended to partition by both.

Things to consider:

How does this impact performance? Are there alternatives. E.g. using one merged id column instead of two throughout the activities

bcodell commented 8 months ago

Hey @awoehrl I'm working on this now. If we're going to extend this functionality to support table clustering, then do you know how to cluster tables on column expressions (e.g. cluster by coalesce(customer_id, anonymous_customer_id) in Bigquery? My research thus far suggests that it isn't possible, in which case I'd need to materialize such a column in each activity model. But I'm not a Bigquery expert, so let me know if you're aware of any workarounds here. Thanks!