It would be great to have an identity resolution logic in the package. This would consist of multiple steps:
Have a stitching table that combines the user identifiers
Depending on the use case this could be a table defined in dbt or in a more complex case a python implementation or a graph database.
Example code
--- user_identity_mapping.sql
select distinct
anonymous_customer_id,
last_value(customer_id) over(
partition by anonymous_customer_id
order by event_tstamp
rows between unbounded preceding and unbounded following
) as customer_id,
max(event_tstamp) over (partition by anonymous_customer_id) as end_tstamp
from events
where customer_id is not null
and anonymous_customer_id is not null
Update the activities via a post hook
With that table we could implement a post hook that updates the customer_id as well as the activity_occurence and activity_repeated_at fields.
Example code for the customer_id update
update activity as a
set a.customer_id = ui.customer_id
from user_identity_mapping as ui
where a.anonymous_customer_id = ui.anonymous_customer_id;
Alternatives to consider:
Is it really neccessary to have two id columns in activities?
It would be great to have an identity resolution logic in the package. This would consist of multiple steps:
Example code
customer_id
as well as theactivity_occurence
andactivity_repeated_at
fields.Example code for the customer_id update
Alternatives to consider: