Closed dgitis closed 10 months ago
As per @willbryant's request in the linked issue, this attempts to put these click IDs into records under xclid.name
and xclid.value
similar to the event_params
and user_props
.
The issue that I'm having is removing the null values from this code.
{% if var('xclid', none) != "none" %}
array_agg(
{%- for clid in var('xclid') -%}
if( substr_contains( 'page_location' , {{clid + "="}} )
struct(
'{{clid}}' as name,
{{ extract_query_parameter_value( 'page_location' , clid ) }} as value
)){% if not loop.last %},{% endif %}
{% endif %}
{%- endfor -%}
) as xclid,
{% endif %}
The problem here is that I need to mix Jinja and SQL conditions to add or exclude the trailing comma in {% if not loop.last %},{% endif %}
.
I believe this code will break when you have trailing xclid array variable values that aren't used because the code will pass the first, Jinja condition {%- for clid in var('xclid') -%}
but fail the second, SQL condition if( substr_contains( 'page_location' , {{clid + "="}} )
.
The final Jinja condition, {% if not loop.last %},{% endif %}
will add multiple trailing slashes when it fails the SQL condition.
I'm not sure how to handle this. Can I maybe nest this in a select statement with a where name is not null
?
My suggestion would be to generate a table of event keys and xclid param/values and then pivoting it (this would require new models). So that
event_key | param_name | param_value |
---|---|---|
1 | gclid | foo |
1 | tclid | bar |
becomes
event_key | gclid | tclid |
---|---|---|
1 | foo | bar |
You can later join the pivoted param data w/ the event data
Also, can you mark this as a draft PR if it's not yet ready for code review?
Separate tables are much less efficient to query though, what's the advantage?
If I understand the goal of the PR correctly, we're trying to pivot X number of URL parameters from rows to columns and the value of X can change from record to record. That sounds like a pivot operation, yes? Whether it's a CTE or separate model I don't have a strong opinion, I'm just recommending looking into using a pivot rather than using Jinja to parse the number of parameters included in each record.
You can always incrementally join this data back with the original event records so that the practical effect would be 1 table with the necessary columns available.
Also, don't forget that we already have stg_ga4__event_to_query_string_params
which generates the first table in my comment.
I've been thinking on these requirements and I think maybe this particular solution isn't quite right.
I have an issue with creating a record because, unless you are doing data viz on Looker Studio, users will need to unnest that record in their marts. The place where I would expect this unnesting to be done is in stg_ga4__events
where we are currently planning on creating this issue.
Instead, I think we should adopt two variables: mutually_exclusive_click_ids
and non_mutually_exclusive_click_ids
.
For most users, the first variable is all that they need. It's for gclid
, fbclid
, and all of the click IDs that come from their own specific ad inventory and can't be mixed.
However, for the attribution platforms, that can be plugged in to ad platforms adding a second click ID to the ad platform, users would need to use the non_mutually_exclusive_click_ids
variable.
The click IDs in the mutually exclusive variable would populate two dedicated text fields in stg_ga4__events
: xclid_name
and xclid_value
.
The click IDs in non-mutually exclusive variable would create a new column for each with the variable as the name that would populate with that ID.
I know that Adam prefers to avoid variables when possible, but using these two variables helps minimize the database size while keeping it flat to maximize compatibility.
If we push this in the separate table direction, then we'd want to do it on the session_key
rather than the event_key
and we'd only populate where the click IDs are not null.
This might end up being more efficient as we wouldn't be having null values in rows from (often the majority of) traffic that doesn't have click IDs.
OK. Given the feedback here, and that I have not found any cases where we see non-mutually-exclusive tracking IDs on a single page request, maybe we need to wind it back a bit towards what you started with @dgitis and just do a single name & a single value column on page requests, and only handle the multiple case at the session level?
RESOLVED BY #252
Description & motivation
Resolves #234
The purpose of this PR is to give users the option to configure whatever click IDs that they want to track like
gclid
,fbclid
as well as numerous other attribution and ad platform click IDs.Todo:
Checklist
dbt test
andpython -m pytest .
to validate existing tests