Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

Assign traffic source, medium, campaign for more traffic parameters beyond gclid #343

Open sashage opened 2 months ago

sashage commented 2 months ago

Currently, there is the CTE detect_gclid in stg_ga4__events.sql that reconstructs source, medium and campaign if the URL parameter gclid is present in page_location

Now, there a several more URL parameters that can be linked to paid ad clicks:

gbraid and wbraid: these are alternatives to gclid that Google uses (https://www.customerlabs.com/blog/what-are-gclid-gbraid-and-wbraid-parameters/)

fbclid: click ID for Facebook/Meta/Instagram paid ads. In most cases source is set to l.facebook.com, m.facebook.com, or lm.facebook.com, which effectively splits the traffic. Same for instagram, which normally has the source instagram.com or l.instagram.com

ttclid: click ID for TikTok paid ads. Here the source is usually null unless set explicitly

This list is of course not complete, but should capture a much larger share of unassigned paid traffic.

Proposed change:

select
        * except (event_source, event_medium, event_campaign),
        case
            when (page_location like '%gclid%' and event_source is null) then "google"
            when (page_location like '%gbraid%' and event_source is null) then "google"
            when (page_location like '%wbraid%' and event_source is null) then "google"
            when (page_location like '%fbclid%' and event_source is null) then "facebook"
            when (page_location like '%fbclid%' and event_source LIKE "%facebook.com") then "facebook"
            when (page_location like '%fbclid%' and event_source LIKE "%instagram.com") then "instagram"
            when (page_location like '%ttclid%' and event_source is null) then "tiktok"
            when (page_location like '%ttclid%' and event_source = "tiktok.com") then "tiktok"
            else event_source
        end as event_source,
        case
            when (page_location like '%gclid%' and event_medium is null) then "cpc"
            when (page_location like '%gclid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%gbraid%' and event_medium is null) then "cpc"
            when (page_location like '%gbraid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%wbraid%' and event_medium is null) then "cpc"
            when (page_location like '%wbraid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_medium is null) then "cpc"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_medium = "referral" ) then "cpc"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_medium is null) then "cpc"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_medium = "referral") then "cpc"
            else event_medium
        end as event_medium,
        case
            when (page_location like '%gclid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%gclid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%gclid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_campaign is null ) then "(cpc)"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_campaign = "(referral)" ) then "(cpc)"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_campaign is null) then "(cpc)"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_campaign = "(referral)") then "(cpc)"

            else event_campaign
        end as event_campaign
    from include_event_key
adamribaudo-velir commented 2 months ago

I'm on board with parsing gbraid, wbraid, fbclick, and ttclick.

I'm just not sure why there are so many checks in the case statements for variations of source/medium/campaign. If a gclid is present in the URL, why bother checking if the event_campaign is null / organic / (organic) before assigning the value to (cpc)? Shouldn't the page_location like '%gclid%' condition be sufficient?

sashage commented 2 months ago

It could be that advertisers pass along custom UTMs that should be respected. null / organic / (organic) are auto-populated for campaign in absense of other utm_campaign values

It also checks for source value within medium and campaign, to make sure that all or none are updated. Otherwise we might get inconsistent UTMs

There are probably better ways to do all the checks. Might even make sense to move the entire logic to separate model for easier maintenance. Many ad networks have their own click IDs that we could add later.

adamribaudo-velir commented 2 months ago

Thanks for the reminder. This mirrors the logic we have implemented today. https://github.com/Velir/dbt-ga4/blob/31f269192cfd778365ea3acd6044a4a1f81cff33/models/staging/stg_ga4__events.sql#L38