Open dgitis opened 1 year ago
Google has changed their documentation and have added the new session-level attribution columns to the BigQuery export.
The collected_traffic_source
record is new and holds all of the fields highlighted in that Google IO announcement. I can't find an example where those fields are being populated.
My big question is whether these fields are being set for all events or for just the first batch. I presume it will be all events otherwise how does it differ from the existing source
, medium
, and gclid
event parameters? However, this being Google, I don't want to count on it being how I expect.
Thanks for the heads up. Definitely something to monitor. We'll also have to decide whether to switch to this method or support both the legacy and new method to account for data collected before this change 🤔
The collected_traffic_source
fields are now populating. They seem to populate all events in except session_start
and first_visit
.
Here's a session with multiple timestamps that I found.
I'd been hearing that the reality of the collected_traffic_sources
is different from what I thought after my previous comment and I just verified that here is how these fields get populated.
The collected_traffic_sources
fields get populated on all events that happen on the first page of a session except for first_visit
and session_start
. It is not the first batch events, as I disproved above.
Events on subsequent pages do not propagate the collected_traffic_sources
fields.
Basically, these session fields only save us from unnesting the data.
Where session data is concerned, I consider this closed.
Google still plans on adding User-Id export to a completely separate table. I'll be updating the title of this to specify User-Id tables.
User ID & Pseudo User ID tables are documented here: https://support.google.com/analytics/answer/12769371
Just enabled the export on my company's GA4 instance so I hope to return to this soon.
I checked this a couple of weeks ago and the daily user counts on these tables (user_is and user_pseudo_id) closely match the user counts on our equivalent tables.
These tables are substantially different from our user tables so I think the best path is to copy these into partitioned dim tables and have two separate sets of user tables with very different fields.
The other option is to enhance these new tables with the fields that we currently have in our old tables.
I'd rather be able to disable the old tables on high-volume projects because they aren't partitioned and many of the fields, like the first_*
and last_*
don't get used all that often.
Google is adding new fields and tables to the BigQuery export. Here's the announcement from Google IO.
The session-level attribution levels only remove the requirement to unnest data that we're already unnesting. However, they will be adding a new table for
user_id
data that will contain data for all users whose data changed in the export period (presumably this will be a partition).For now, there's nothing that we can do without seeing the export in the wild, but I suggest we put a freeze on major changes to user data.
@willbryant @adamribaudo-velir