Open sashage opened 2 months ago
I'm on board with parsing gbraid, wbraid, fbclick, and ttclick.
I'm just not sure why there are so many checks in the case statements for variations of source/medium/campaign. If a gclid
is present in the URL, why bother checking if the event_campaign is null / organic / (organic)
before assigning the value to (cpc)
? Shouldn't the page_location like '%gclid%'
condition be sufficient?
It could be that advertisers pass along custom UTMs that should be respected.
null / organic / (organic)
are auto-populated for campaign
in absense of other utm_campaign
values
It also checks for source value within medium and campaign, to make sure that all or none are updated. Otherwise we might get inconsistent UTMs
There are probably better ways to do all the checks. Might even make sense to move the entire logic to separate model for easier maintenance. Many ad networks have their own click IDs that we could add later.
Thanks for the reminder. This mirrors the logic we have implemented today. https://github.com/Velir/dbt-ga4/blob/31f269192cfd778365ea3acd6044a4a1f81cff33/models/staging/stg_ga4__events.sql#L38
Currently, there is the CTE
detect_gclid
instg_ga4__events.sql
that reconstructs source, medium and campaign if the URL parametergclid
is present inpage_location
Now, there a several more URL parameters that can be linked to paid ad clicks:
gbraid
andwbraid
: these are alternatives to gclid that Google uses (https://www.customerlabs.com/blog/what-are-gclid-gbraid-and-wbraid-parameters/)fbclid
: click ID for Facebook/Meta/Instagram paid ads. In most cases source is set to l.facebook.com, m.facebook.com, or lm.facebook.com, which effectively splits the traffic. Same for instagram, which normally has the source instagram.com or l.instagram.comttclid
: click ID for TikTok paid ads. Here the source is usually null unless set explicitlyThis list is of course not complete, but should capture a much larger share of unassigned paid traffic.
Proposed change: