Closed adamribaudo-velir closed 1 year ago
I'm thinking the solution here is to build a minimal session table partitioned by day that contains session data possibly with no window functions and certainly with no lookback period. I'll call it int_ga4__sessions_daily
.
If we aim for no window functions, we also add a partitioned page_view
event table, fct_ga4__event_page_view
.
On top of these two tables will build a daily sessions table dim_ga4__sessions
or dim_ga4__sesssions_daily
that uses a configurable lookback window (30 days is a good default, but this is really a business decision, so it should be in a variable. Since we have the lookback window configured here, we also reconcile cross-partition sessions in this query when hitting int_ga4__sessions_daily
and get first_last page_view data from fct_ga4__event_page_view
with only a one-day lookback while also resolving last_non_direct.
The goal of this architecture is to have all window functions hitting smaller tables.
Another issue that I've ran into when I set this up for a client on an old fork of the project is sessions with only first_visit
and session_start
events. Because we filter these events out from our session attribution, we end up failing to include those session keys in our session attribution queries, so when we join attribution data later, there's no data to join against.
We need downstream models to change null to direct or we need to decide to that we want to filter out these sessions. I think changing null to direct to be safer because it's hard to anticipate how people will use GA4. In my client's case, removing those sessions was the right call, but I'm not ready to make that generalization.
Description & motivation
Closes #110
The following model has been added which provides last non-direct session attribution:
stg_ga4__sessions_traffic_sources_last_non_direct_daily
This model is similar to
stg_ga4__sessions_traffic_sources_daily
but provides the following new fields:Each field contains the traffic source values from the last session with a non-direct
session_source
value. The effect can be seen in the graphic below where the '(none)' channel grouping is replaced with 'Organic Search'A unit test has been created which mimics this behavior and validates the results.
Checklist
dbt test
andpython -m pytest .
to validate existing tests