aliasoblomov / Universal-Analytics-to-BigQuery

This repository features a Python script designed extracting data from Universal Analytics, preparing it for compatibility, and subsequently loading it into Google BigQuery. This is particularly beneficial for businesses aiming to transfer their historical UA (GA3) data to BigQuery, especially those without access to Google Analytics 360.
26 stars 26 forks source link

Metrics are not accurate when multiple dimensions are present #12

Open Zolifu opened 2 months ago

Zolifu commented 2 months ago

Hi,

The API in setup in this form pull only the fraction of results when there is a more complex dimension schema. I can only think that empty rows are not pulled when one of the dimensions has no values.

An example for a month period we pul 1,5M session with only the gs:date field while having 7 other dimensions listed below we get 2,820.

Anyone can urgently help how we can set that empty rows are also pulled in?

eg, {'expression': 'ga:sessions'},

{'expression': 'ga:pageviews'},

                #{'expression': 'ga:users'},
                #{'expression': 'ga:newUsers'},
                #{'expression': 'ga:bounces'},
                #{'expression': 'ga:sessionDuration'},
                #{'expression': 'ga:transactions'},
                #{'expression': 'ga:uniquePurchases'},
                #{'expression': 'ga:itemQuantity'},
                #{'expression': 'ga:goalCompletionsAll'},
peroksid5 commented 2 months ago

The problem is probably Google Analytics Sampling: https://support.google.com/analytics/answer/2637192

Once you hit the threshold, Google returns "sampled" (aka gibberish) data or empty data.

The same happens in the GA interface: if you select i.e. too many dimensions or a too long time period, the data is not accurate.

Google sadly does not offer an API (or an interface) to download the actual raw data in bulk. Same limitations as in the analytics interface are present through the API as well.