MarkEdmondson1234 / searchConsoleR

R interface with Google Search Console API v3, including Search Analytics.
http://code.markedmondson.me/searchConsoleR/
Other
114 stars 41 forks source link

Querying one day of data at a time only gives 5000 rows #48

Open jceallonardo opened 5 years ago

jceallonardo commented 5 years ago

What goes wrong

When running search_analytics on 1 day, row_limit appears to cap out at 5,000 rows.

I know an issue regarding 5000 rows was created a few years ago, but this might be a different problem since Google recently upped the max rowLimit to 25,000.

Steps to reproduce the problem

searchConsoleR version 0.3.0.9000 googleAuthR version 0.7.0.9000

uri <- "https://www.mydomain.com/" start <- Sys.Date() - 4 end <- Sys.Date() - 4 dims <- c('query') listwebs <- list_websites() data <- search_analytics(siteURL = uri, startDate = start, endDate = end, dimensions = dims, rowLimit = 25000)

Expected output

data.frame with more than 5,000 obs.

Actual output

data.frame with exactly 5,000 obs.

I have tried with multiple domains, and it outputs 5,000 rows every time.

Verbose output:

Fetching search analytics for url: https://www.mydomain.com/ dates: 2018-12-14 2018-12-14 dimensions: query dimensionFilterExp: searchType: web aggregationType: auto 2018-12-18 16:15:05> Token exists. 2018-12-18 16:15:05> Request: https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.mydomain.com%2F/searchAnalytics/query 2018-12-18 16:15:05> Body JSON parsed to: {"startDate":"2018-12-14","endDate":"2018-12-14","dimensions":["query"],"searchType":"web","dimensionFilterGroups":[{"groupType":"and","filters":[]}],"aggregationType":"auto","rowLimit":25000}

Session Info

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.2

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] searchConsoleR_0.3.0.9000

loaded via a namespace (and not attached): [1] rstudioapi_0.8 magrittr_1.5 R6_2.3.0 httr_1.4.0
[5] tools_3.5.1 pkgbuild_1.0.2 cli_1.0.1 googleAuthR_0.7.0.9000 [9] withr_2.1.2 remotes_2.0.2 openssl_1.1 yaml_2.2.0
[13] assertthat_0.2.0 digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4
[17] processx_3.2.1 callr_3.1.0 ps_1.2.1 curl_3.2
[21] memoise_1.1.0 compiler_3.5.1 backports_1.1.3 prettyunits_1.0.2
[25] jsonlite_1.6

jceallonardo commented 5 years ago

Another important note is that I believe this has implications to batching "byDate", as a similar 5000 row limit is reached per day, even though the package states that 25000 rows are being fetched.

MarkEdmondson1234 commented 5 years ago

I can't reproduce this, it gets 25000 rows per batch for me when I use byBatch and 25000 per day when I use byDate

my_example <- "http://www.example.co.uk"
sa2 <- search_analytics(my_example, startDate = Sys.Date() - 10, 
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byBatch", rowLimit = 50000)
# 50000 rows
nrow(sa2)

sa3 <- search_analytics(my_example, startDate = Sys.Date() - 5,  endDate = Sys.Date() - 3
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byDate")

# 75000 rows
nrow(sa3)
jceallonardo commented 5 years ago

I get your outputs when I include all of the dimensions you do, but try running your query again with just the "date" and "query" dimensions.

MarkEdmondson1234 commented 5 years ago

Yes I see now:

sa2 <- search_analytics(my_example, startDate = Sys.Date() - 5,dimensions = c("date","query"), walk_data = "byDate")
Fetching search analytics for url: https://www.world-first.co.uk/ dates: 2018-12-14 2018-12-16 dimensions: date query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-12-19 15:19:14> Request #: 2018-12-14
2018-12-19 15:19:17> Request #: 2018-12-15
2018-12-19 15:19:19> Request #: 2018-12-16

# 15000 rows
nrow(sa2)

Hmm, well there is nothing in the code that does this so I guess its the API itself limiting the results when you just query those dimensions. If thats true a Python call will return similar, perhaps it should be lodged as a bug with the Search Console API team if its verified.

jceallonardo commented 5 years ago

Yeah. I just ran a test w/ Python and got the same. Weird. I don't recall this being an issue before.