Velir / dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.
MIT License
288 stars 127 forks source link

Page engagement time refinement / edge cases #330

Open dgitis opened 1 week ago

dgitis commented 1 week ago

I've noticed that our current calculation for page engagement time often gets quite close to the GA4 interface. It can also be quite different possibly due to differences in how query strings are handled (I think GA4 rolls up engagement time to the page path while we use the full URL but provide the facility to remove query parameters).

I've been looking more in-depth at how Google records engagement_time_msec and I'd like to discuss some different situations so we can be more deliberate in how we handle them.

In this screenshot, you can see the GA4's documented behavior where the user_engagement event fires when someone clicks to the next page in the event pairs in rows 6 and 7, 8 and 9, and 10 and 11.

image

Event 5, however, is a page reload. The engagement time for that event currently gets assigned to the previous page, which is what we want to happen if the user_engagement event doesn't fire but in this case we want it to be assigned to the current page.

Additionally, should a page reload count as two page views when calculating the average engagement time?

In this case, we could exclude page_view events that have engagement_time_msec assigned from the denominator if it appears that the only situation in which they get engagement time is on page reloads.

I think row 18 shows the visitor changing from a tab with the about page open, to a tab with the speaking page open (row 19).

The visitor definitely interacts with the speaking page as there is a scroll event, row 20, immediately after.

Rows 21, 22, 23, and 24, I think shows the visitor closing tabs based on the elapsed time between each event and the lack of any engagement time on those events. All of those pages were opened earlier in the session.

Looking at this, I don't think we want to count these events in the denominator of the average engagement time calculation. Maybe we count distinct page_locations within a session as the denominator.

I'm pretty certain I've seen some other edge-case situations on other sites. I'll try to find some more

dgitis commented 1 week ago

This shows a visit that begins with someone opening a lot of tabs.

image

The hashed values for page_location and page_referrer indicate a page belonging to the site in question.

I'm not sure if there's anything to be said about this. I suspect it is internal traffic because the behavior is very strange. Look at all of the different page_referrer values, for example.

It's also interesting that the event_bundle_sequence_id is always the same. I think this happens with server-side GTM as this is not the first time I've seen this.

Here's a set of events that are both referred from the same page from another site owned by the company that is excluded from referral traffic (I had to cut off referrer because I didn't filter out that domain name).

image

The scroll event on row 38 fires when the Google Tag loads (by default it fires at 90 percent, so the whole page here is presumably in the viewport) but the page_view is triggered separately from the Google Tag on row 39.

Does the 2385 milliseconds of engagement time on this page belong to the current page_location or the page_referrer. In this case it is mostly page load time, which is painful, but I'm thinking we should point the engagement time to the current page_location.

The user_engagement event does not fire before clicking over. In the first example I shared in the initial report, the user_engagement event is firing cleanly before each page view. Here, the referring domain does not have cross-domain tracking set up, it is untracked, but it is excluded as a referral very much like what you would see on a payment gateway where you can't add analytics code so I'd say getting this right is important.

The user_engagement events on lines 37 and 40 are closing those pages and not navigating from the one on 37 to the one on 38 through 40.