Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
324 stars 135 forks source link

Add session_traffic_source_last_click fields #348

Open adamribaudo-velir opened 2 weeks ago

adamribaudo-velir commented 2 weeks ago

Description & motivation

Google recently released a new set of fields, session_traffic_source_last_click, documented as:

The session_traffic_source_last_click RECORD contains the last-click attributed session traffic source data across Google ads and manual contexts, where available.

These fields have been added to the base package model.

Checklist

adamribaudo-velir commented 2 weeks ago

@dgitis I ran this and noticed differences between our stg_ga4__sessions_traffic_sources_last_non_direct_daily model and Google's last_click fields. When I looked at the raw events, it looked like our model was more accurate which was weird.

I think it will only confuse people to have 2 definitions of 'session last click attribution' in the package. I suppose we should just include these fields and remove stg_ga4__sessions_traffic_sources_last_non_direct_daily just thought I'd check with you first.

dgitis commented 2 weeks ago

The advantage of our method over Google's is that we don't decouple the attribution fields where GA4 does.

So, if you see source / medium / campaign from sessions in this order from earliest to latest:

facebook / paid_social / my_fb_campaign google / organic direct

The last, non direct source / medium / campaign in GA4 would be google / organic / my_fb_campaign while the package would return google / organic / null.

As with my comment on the other PR, should we maybe make this configurable?

The advantages of using Google's definitions are as follows:

While the advantage of our definitions are as follows:

I'm thinking that we create a use_google_attribution_fields variable.

At this stage, the new variable only enables these fields in the base model, but in the future we could modify our various attribution models to detect this variable and return vastly different SQL depending on the configuration.