Open jceallonardo opened 5 years ago
Another important note is that I believe this has implications to batching "byDate", as a similar 5000 row limit is reached per day, even though the package states that 25000 rows are being fetched.
I can't reproduce this, it gets 25000 rows per batch for me when I use byBatch
and 25000 per day when I use byDate
my_example <- "http://www.example.co.uk"
sa2 <- search_analytics(my_example, startDate = Sys.Date() - 10,
dimensions = c("date","device", "country" ,"query","page"),
walk_data = "byBatch", rowLimit = 50000)
# 50000 rows
nrow(sa2)
sa3 <- search_analytics(my_example, startDate = Sys.Date() - 5, endDate = Sys.Date() - 3
dimensions = c("date","device", "country" ,"query","page"),
walk_data = "byDate")
# 75000 rows
nrow(sa3)
I get your outputs when I include all of the dimensions you do, but try running your query again with just the "date" and "query" dimensions.
Yes I see now:
sa2 <- search_analytics(my_example, startDate = Sys.Date() - 5,dimensions = c("date","query"), walk_data = "byDate")
Fetching search analytics for url: https://www.world-first.co.uk/ dates: 2018-12-14 2018-12-16 dimensions: date query dimensionFilterExp: searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-12-19 15:19:14> Request #: 2018-12-14
2018-12-19 15:19:17> Request #: 2018-12-15
2018-12-19 15:19:19> Request #: 2018-12-16
# 15000 rows
nrow(sa2)
Hmm, well there is nothing in the code that does this so I guess its the API itself limiting the results when you just query those dimensions. If thats true a Python call will return similar, perhaps it should be lodged as a bug with the Search Console API team if its verified.
Yeah. I just ran a test w/ Python and got the same. Weird. I don't recall this being an issue before.
What goes wrong
When running search_analytics on 1 day, row_limit appears to cap out at 5,000 rows.
I know an issue regarding 5000 rows was created a few years ago, but this might be a different problem since Google recently upped the max rowLimit to 25,000.
Steps to reproduce the problem
searchConsoleR version 0.3.0.9000 googleAuthR version 0.7.0.9000
uri <- "https://www.mydomain.com/" start <- Sys.Date() - 4 end <- Sys.Date() - 4 dims <- c('query') listwebs <- list_websites() data <- search_analytics(siteURL = uri, startDate = start, endDate = end, dimensions = dims, rowLimit = 25000)
Expected output
data.frame with more than 5,000 obs.
Actual output
data.frame with exactly 5,000 obs.
I have tried with multiple domains, and it outputs 5,000 rows every time.
Verbose output:
Fetching search analytics for url: https://www.mydomain.com/ dates: 2018-12-14 2018-12-14 dimensions: query dimensionFilterExp: searchType: web aggregationType: auto 2018-12-18 16:15:05> Token exists. 2018-12-18 16:15:05> Request: https://www.googleapis.com/webmasters/v3/sites/https%3A%2F%2Fwww.mydomain.com%2F/searchAnalytics/query 2018-12-18 16:15:05> Body JSON parsed to: {"startDate":"2018-12-14","endDate":"2018-12-14","dimensions":["query"],"searchType":"web","dimensionFilterGroups":[{"groupType":"and","filters":[]}],"aggregationType":"auto","rowLimit":25000}
Session Info
R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.2
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] searchConsoleR_0.3.0.9000
loaded via a namespace (and not attached): [1] rstudioapi_0.8 magrittr_1.5 R6_2.3.0 httr_1.4.0
[5] tools_3.5.1 pkgbuild_1.0.2 cli_1.0.1 googleAuthR_0.7.0.9000 [9] withr_2.1.2 remotes_2.0.2 openssl_1.1 yaml_2.2.0
[13] assertthat_0.2.0 digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4
[17] processx_3.2.1 callr_3.1.0 ps_1.2.1 curl_3.2
[21] memoise_1.1.0 compiler_3.5.1 backports_1.1.3 prettyunits_1.0.2
[25] jsonlite_1.6