Closed lg1000 closed 2 years ago
Are you pulling all data? You have 10000 limit but put -1 to pull all rows. That would at least eliminate that possibility ;)
Otherwise I suggest comparisons to a custom report with exact same metrics and dimensions as your API call to be certain of the aggregations taking place.
Also a common issue is not drawing from exact same profile.
Thanks for the fast response! I am not sure, what you mean by -1 for all rows. As both of my calls access less than 10000 rows, there should be no issue.
I did check versus the exact same metrics and dimensions in GA interface, getting different results.
One issue might be, that I changed the API from a former .httr token file to a json file authentication, as automatic refresh of the token did not work anymore after some months (the API script gets triggered via Windows job). For this purpose I created a service account. I followed this tutorial: https://www.gormanalysis.com/blog/google-analytics-in-r-part-1/
I created the json file via "create key" at service account setup first:
googleAuthR::gar_auth_service(
json_file = "D:/WD_R/SQL_GoogleA/garcreds-xxxx.json",
scope = "https://www.googleapis.com/auth/analytics.readonly"
)
Next I created the client authentication file via Google Analytics API credentials (both are Auth 2.0)
googleAuthR::gar_set_client(
json = "D:/WD_R/SQL_GoogleA/client_secret_xxxx.apps.googleusercontent.com.json",
scopes = c("https://www.googleapis.com/auth/analytics.readonly")
)
The auth method won't affect the data.
When comparing to the webUI are you using a custom report? Can I see some screenshots of which you are using?
Oh also anti_sample for user level metrics will change values and any metric dependent on date ranges such as new users, which is defined as new users in the period you are fetching.
You where right about comparing with a custom report, as it makes things a lot clearer. Comparing the report with only the date dimension did show, that there might be mainly rounding differences, causing the confusion.
However, for the users ("Nutzer"), there is still a difference, when it comes to how GA presents data in rows (adding up to 12.654 as in my query aggregation) and data in the header aggregation (11.465). This is confusing me on the one hand, because if I am adding the values manually, I am getting 12.654 in R and GA.
On the other hand, when I am adding the PagePath dimension to my custom Report, I get the same results as for the single dimension report only in the header aggregation. When I pull the data via R, I am retrieving the same number of rows as in the custom report in GA, but when I am aggregating it, there are huge differences, when it comes to users, avgSessionDuration, bounceRate and transactionsPerSession. Does this all have to do with how I defined my query, or is there something about GA's data structure and aggregation process I do not understand?
Yep makes sense.
The user metrics are affected by the use of anti_sample=TRUE. In GA a user is the number of unique cookies seen in the time period you have selected. As anti_sampling breaks it up into multiple date ranges it will give different totals when adding them up. The only alternative is to use sessions or buy GA360 ;)
Ok, that makes sense. What exactly do you mean by using sessions? Not sure, my customer is willing to pay for GA360 :/
I mean report on sessions instead, not users. It's not a very reliable metric anyhow ;)
This wouldn't be so bad, but what about the transactionPerSession rate. This is crucial for them. When I want to add PagePath as second dimension, I will still retrieve the wrong rate. I need to add PagePath, because I can only filter for b2b customers, using the url, because advanced segments which do the same, I cannot access via my service account
Transactions per sessions should be unaffected, since the transaction and session totals are ok. Divide the totals by each other and should be exactly the same. Its not a good idea to aggregate over averages, do the totals then do the ratio.
391/15020 = 0.026031957
Thanks a lot for your patience! I guess this solves the issue, even if I will loose the costumer metric
Today I tested two different queries to check if the data aligns with GA, checking them via GA interface. The first query has only one dimension (date) and the second has got two (date and PagePath).
For the metrics sessions, pageviews, transactions, transactionRevenue and new users I get the right results for both of the queries.
For the metrics users, avgSessionDuration, transactionsPerSession and bounceRate I get different results compared to GA interface for both queries. Also the query with the additional segment PagePath gives lower results for all percentage metrics that already differ from GA interface. For the metric users it gives back an amount that is like five times higher than with the first query.
As you can see from my code below, the syntax is the same for both queries, except for the dimension. How do you explain the difference between the queries and also between the queries and GA interface?
`
`
with PagePath
single dimension
Also for example the number of users for this time span per GA interface is 11.465 and the bounce rate is 46.94 and so on, while the number of new users is right and so on....