Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

Support New Batch Fields/Review Impact on Existing Code #335

Open dgitis opened 3 months ago

dgitis commented 3 months ago

Google added batch_page_id, batch_ordering_id, and batch_event_index to the BigQuery export.

There are unnested versions of this field that appear in the daily export, but the batch_page_id, and batch_ordering_id both seem to also appear as event parameters in both the daily and intraday exports.

image

Would it be a good idea to append the batch ordering ID to the event_timestamp so that events sort by their actual order?

Will this prevent our light event deduplication in base_ga4__events from doing anything?

qualify row_number() over(partition by event_date_dt, stream_id, user_pseudo_id, session_id, event_name, event_timestamp, to_json_string(ARRAY(SELECT params FROM UNNEST(event_params) AS params ORDER BY key))) = 1

I suppose it's likely to work on duplicate events triggered from GTM, but duplicate on-page Gtag events are likely to trigger sequentially based on where they appear in the code and duplicate events triggered one triggered by on-page Gtag and one by GTM are also likely to trigger sequentially.

We should test this to be certain, but maybe we can remove this code.