8-bit-sheep / googleAnalyticsR

Use the Google Analytics API from R
https://8-bit-sheep.com/googleAnalyticsR/
Other
259 stars 76 forks source link

Splitting data into multiple rows when using anti_sample=true #49

Closed martijnvv closed 7 years ago

martijnvv commented 7 years ago

I run the following code with or without anti_sample:

cf <- dim_filter("dimension7","EXACT",campaign_name,not = FALSE)
fc <- filter_clause_ga4(list(cf))
gaDataFunnel_dev1 <- google_analytics_4(viewId, 
                                    date_range = c(dateStart, dateYesterday),
                                    dimensions=c("deviceCategory", "dimension7"), 
                                    metrics = c("uniquePageviews"),
                                    order = order_type("deviceCategory", "ASCENDING"),
                                    dim_filters = fc,
                                    anti_sample = TRUE)

The code with anti_sample=true returns the same values separated into two rows:

anti_sample true

Without anti_sample=true, the table is no longer split into multiple rows:

no_anti_sample true

This seems to occur only in the past few days. It didn't happen before. I'm running version 0.3.0 of the package

MarkEdmondson1234 commented 7 years ago

Thanks - may I ask how many rows are in the tables you are fetching? Also the results of sessionInfo() just to check all dependency versions. You may want to try using latest version of googleAuthR from Github to see if that helps.

martijnvv commented 7 years ago

The table has 6 vs 3 rows (deviceCategory). I just ran the same query after updating the googleAuthR version from github. Issue persists.

See sessionInfo() below:

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gdata_2.17.0           ggthemes_3.2.0         ggplot2_2.1.0          highcharter_0.4.0     
 [5] forecast_7.3           timeDate_3012.100      zoo_1.7-13             reshape2_1.4.1        
 [9] rmarkdown_1.1          RJSONIO_1.3-0          tidyjson_0.2.1         jsonlite_1.1          
[13] knitr_1.14             lubridate_1.6.0        googlesheets_0.2.1     dplyr_0.5.0           
[17] openxlsx_3.0.0         RGA_0.4.2              googleAnalyticsR_0.3.0
MarkEdmondson1234 commented 7 years ago

Weird, ok thanks, will take a look.

MarkEdmondson1234 commented 7 years ago

Hi @martijnvv , I can't replicate this with the below code:

> dateStart <- Sys.Date() - 30
> dateYesterday <- Sys.Date() - 1
> cf <- dim_filter("source","EXACT","google",not = FALSE)
> fc <- filter_clause_ga4(list(cf))
> gaDataFunnel_dev1 <- google_analytics_4(viewId, 
+                                         date_range = c(dateStart, dateYesterday),
+                                         dimensions=c("deviceCategory", "source"), 
+                                         metrics = c("uniquePageviews"),
+                                         order = order_type("deviceCategory", "ASCENDING"),
+                                         dim_filters = fc,
+                                         anti_sample = TRUE)
anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Auto-refreshing stale OAuth token.
Downloaded [3] rows from a total of [3].
No sampling found, returning call
Downloaded [3] rows from a total of [3].

Could you perhaps also report the message feedback when you set options(googleAuthR.verbose = 1) before running your call.

And also install the version of googleAnalyticsR from Github v0.3.0.9000, there was a bug with zero-rows in anti-sample calls that may have an effect.

martijnvv commented 7 years ago

Hi @MarkEdmondson1234 ,

I first updated to version 0.3.0.9000, that made no difference... Below is the response to my query. Your query may not have this issue, because there is no unsampling of the data required in your request.

anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":10,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 639
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":10,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:27 GMT
<- server: ESF
<- cache-control: private
<- content-length: 364
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Data is sampled, based on 90.3% of sessions.
Finding number of sessions for anti-sample calculations...
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:date"}],"metrics":[{"expression":"ga:sessions","alias":"sessions","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":1000,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 328
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-07"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:date"}],"metrics":[{"expression":"ga:sessions","alias":"sessions","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":1000,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:28 GMT
<- server: ESF
<- cache-control: private
<- content-length: 465
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [21] rows from a total of [21].
Calculated [2] batches are needed to download approx. [4] rows unsampled.
Found [447099] sampleReadCounts from a [549840] samplingSpaceSize.
Anti-sample call covering 16 days: 2016-11-17, 2016-12-02
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-02"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 638
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-11-17","endDate":"2016-12-02"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:31 GMT
<- server: ESF
<- cache-control: private
<- content-length: 338
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Anti-sample call covering 5 days: 2016-12-03, 2016-12-07
Calling APIv4....
Single v4 batch
Token exists.
Valid local token
Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/
Body JSON parsed to: {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-12-03","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}
-> POST /v4/reports:batchGet/ HTTP/1.1
-> Host: analyticsreporting.googleapis.com
-> User-Agent: googleAuthR/0.4.0.9000 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer ya29.CjCuA8UgttmD7g9Ggv7IBDdklIP-IFNJGT4qzqLrB3NHH8vugTAvk74xSxkGhOxXMUk
-> Content-Length: 638
-> 
>> {"reportRequests":[{"viewId":"ga:5234635","dateRanges":[{"startDate":"2016-12-03","endDate":"2016-12-07"}],"samplingLevel":"LARGE","dimensions":[{"name":"ga:deviceCategory"}],"metrics":[{"expression":"ga:uniquePageviews","alias":"uniquePageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"dimensionFilterClauses":{"operator":"OR","filters":[{"dimensionName":"ga:pagePath","not":false,"operator":"BEGINS_WITH","expressions":["/DL-Zorgverzekering/berekening/Start/"],"caseSensitive":false}]},"orderBys":{"fieldName":"ga:deviceCategory","orderType":"VALUE","sortOrder":"ASCENDING"},"pageToken":"0","pageSize":4,"includeEmptyRows":true}]}

<- HTTP/2 200 
<- content-type: application/json; charset=UTF-8
<- vary: Origin
<- vary: X-Origin
<- vary: Referer
<- content-encoding: gzip
<- date: Thu, 08 Dec 2016 15:43:33 GMT
<- server: ESF
<- cache-control: private
<- content-length: 321
<- x-xss-protection: 1; mode=block
<- x-frame-options: SAMEORIGIN
<- x-content-type-options: nosniff
<- alt-svc: quic=":443"; ma=2592000; v="35,34"
<- 
Downloaded [3] rows from a total of [3].
Finished unsampled data request, total rows [6]
Successfully avoided sampling
MarkEdmondson1234 commented 7 years ago

@martijnvv The latest Github version now includes a aggregateGAData function that aggregates across dimensions, which is useful on its own, but I have also applied it to the results of anti_sample runs which should solve the above issue. Let me know if it works, in particular keep an eye on if a metric should be summed or averaged as this is performed by the library and not via the API as is done in other calls, so could be different and have strange behaviour (things like bounceRate, avgTimeOnsite etc.)

martijnvv commented 7 years ago

@MarkEdmondson1234 Consider that bug squashed. I haven't run any queries with averaged values or percentages, only absolutes.

Thanks!