bcodell / dbt-activity-schema

A dbt package with a POC implementation of an interface to query activity streams that adhere to the Activity Schema 2.0 spec.
Apache License 2.0
12 stars 2 forks source link

feature - Add identity resolution #30

Open awoehrl opened 10 months ago

awoehrl commented 10 months ago

It would be great to have an identity resolution logic in the package. This would consist of multiple steps:

  1. Have a stitching table that combines the user identifiers Depending on the use case this could be a table defined in dbt or in a more complex case a python implementation or a graph database.

Example code

--- user_identity_mapping.sql

select distinct
  anonymous_customer_id,
  last_value(customer_id) over(
    partition by anonymous_customer_id
    order by event_tstamp
    rows between unbounded preceding and unbounded following
  ) as customer_id,
  max(event_tstamp) over (partition by anonymous_customer_id) as end_tstamp

from events

where customer_id is not null
and anonymous_customer_id is not null
  1. Update the activities via a post hook With that table we could implement a post hook that updates the customer_id as well as the activity_occurence and activity_repeated_at fields.

Example code for the customer_id update

      update activity as a
      set a.customer_id = ui.customer_id
      from user_identity_mapping as ui
      where a.anonymous_customer_id = ui.anonymous_customer_id;

Alternatives to consider:

bcodell commented 8 months ago

Possible implementations: