Querying one day of data at a time only gives 5000 rows #48

Open jceallonardo opened 5 years ago

jceallonardo commented 5 years ago

What goes wrong

When running search_analytics on 1 day, row_limit appears to cap out at 5,000 rows.

I know an issue regarding 5000 rows was created a few years ago, but this might be a different problem since Google recently upped the max rowLimit to 25,000.

Steps to reproduce the problem

searchConsoleR version googleAuthR version

uri <- "" start <- Sys.Date() - 4 end <- Sys.Date() - 4 dims <- c('query') listwebs <- list_websites() data <- search_analytics(siteURL = uri, startDate = start, endDate = end, dimensions = dims, rowLimit = 25000)

Expected output

data.frame with more than 5,000 obs.

Actual output

data.frame with exactly 5,000 obs.

I have tried with multiple domains, and it outputs 5,000 rows every time.

Verbose output:

Fetching search analytics for url: dates: 2018-12-14 2018-12-14 dimensions: query dimensionFilterExp: searchType: web aggregationType: auto 2018-12-18 16:15:05> Token exists. 2018-12-18 16:15:05> Request: 2018-12-18 16:15:05> Body JSON parsed to: {"startDate":"2018-12-14","endDate":"2018-12-14","dimensions":["query"],"searchType":"web","dimensionFilterGroups":[{"groupType":"and","filters":[]}],"aggregationType":"auto","rowLimit":25000}

Session Info

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.2

jceallonardo commented 5 years ago

Another important note is that I believe this has implications to batching "byDate", as a similar 5000 row limit is reached per day, even though the package states that 25000 rows are being fetched.

MarkEdmondson1234 commented 5 years ago

I can't reproduce this, it gets 25000 rows per batch for me when I use byBatch and 25000 per day when I use byDate

my_example <- ""
sa2 <- search_analytics(my_example, startDate = Sys.Date() - 10, 
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byBatch", rowLimit = 50000)
# 50000 rows

sa3 <- search_analytics(my_example, startDate = Sys.Date() - 5,  endDate = Sys.Date() - 3
                         dimensions = c("date","device", "country" ,"query","page"), 
                         walk_data = "byDate")

# 75000 rows
jceallonardo commented 5 years ago

I get your outputs when I include all of the dimensions you do, but try running your query again with just the "date" and "query" dimensions.

MarkEdmondson1234 commented 5 years ago

Yes I see now:

sa2 <- search_analytics(my_example, startDate = Sys.Date() - 5,dimensions = c("date","query"), walk_data = "byDate")
Fetching search analytics for url: dates: 2018-12-14 2018-12-16 dimensions: date query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-12-19 15:19:14> Request #: 2018-12-14
2018-12-19 15:19:17> Request #: 2018-12-15
2018-12-19 15:19:19> Request #: 2018-12-16

# 15000 rows

Hmm, well there is nothing in the code that does this so I guess its the API itself limiting the results when you just query those dimensions. If thats true a Python call will return similar, perhaps it should be lodged as a bug with the Search Console API team if its verified.

jceallonardo commented 5 years ago

Yeah. I just ran a test w/ Python and got the same. Weird. I don't recall this being an issue before.