dwyl / learn-google-analytics

:chart_with_upwards_trend: A quick guide to learning/using google analytics for your web app.
73 stars 12 forks source link

What sources can distort analytics data and how can I detect them? #24

Open Cleop opened 5 years ago

Cleop commented 5 years ago

I was recently contacted by one of our client's regarding some unexplained data on their Google Analytics. The project is currently undergoing testing with a small sample size, the site requires password access and is not being marketed. So they felt they should have a good idea of how many people approximately are accessing the site and therefore what might be a logical number of page views.

Despite this, they had data suggesting 80 page views in comparison to the 20 or so test users they were expecting to arrive on the login page. In addition they also had around 20 hits to an endpoint that doesn't actually exist on the site and users based on a continent where they/we currently have no staff based. So they asked me what could be causing these results in the data?

The first thing to understand is what GA classes to be 'unique ' page views. In principle it could be 20 people each being detected 4 times as they use different devices/ IP addresses / sessions etc. So I checked and found that at present google defines 'unique page views' to be:

And so to give context to that explanation, how is a session defined?

A session is a group of user interactions with your website that take place within a given time frame. How long does a session last? By default, a session lasts until there's 30 minutes of inactivity, but you can adjust this limit so a session lasts from a few seconds to several hours.

https://support.google.com/analytics/answer/2731565?hl=en

So in principle it could be that the same 20 people came back and forth collectively 80 times. Although looking at the retention metrics may help to back up this hypothesis. Also, depending on whether users are logged out automatically from the site, perhaps it might appear odd for users to choose to logout every time they leave the site. Unless they're on a public/shared device or they want to make sure no one can access their account (user testing could help understand this dynamic).

But what about the sessions abroad? And the unknown endpoint?

There is another factor that could be at play here. Bots scrape the web and often trial out common endpoints. So the requests to the non-existent endpoint could be an indication of bots attempting to access pages. It may be possible to assess the likelihood of this by comparing the number of page views on the login page in comparison to the landing page that users reach once they've successfully logged in. As the bots haven't been able to access the site. The bots may also be located anywhere as well which could explain the unknown locations.