MarkEdmondson1234 / searchConsoleR

R interface with Google Search Console API v3, including Search Analytics.
http://code.markedmondson.me/searchConsoleR/
Other
114 stars 42 forks source link

Running on Windows error: downloading with rowLimit 5000+ returns "Error in if (s[length(s)] == "") s <- s[-length(s)]" for particular dates #43

Open Leszek-Sieminski opened 6 years ago

Leszek-Sieminski commented 6 years ago

Hi! This is my first issue so sorry for any mistakes or lacking info. I'll be glad to provide further info.

What goes wrong

First of all I'm afraid this error might not be fully reproducible and I'm sorry for that. I have set of dates and want to use them to download search console data (in a loop). Real examples:

Everything seems fine for all dates when I download with rowLimit <= 5000 and walk_data = c("byBatch). Increasing rowLimit above 5000 on "2017-03-17" works perfectly fine.

Unfortunately, increasing rowLimit on "2017-03-18" produces an error :Error in if (s[length(s)] == "") s <- s[-length(s)]

It's strange because I checked manually data in Search Console and it seems that dates producing this error are normal - there is data for each one of them. I suppose this might be somehow connected to this particular website, but I cannot provide its address or tokens.

Code

# authetication
searchConsoleR::scr_auth(token = token,
                         new_user = FALSE)

gsc_websites <- list_websites()

# 2018-07-12 14:55:15> Token exists.
# 2018-07-12 14:55:15> Request: https://www.googleapis.com/webmasters/v3/sites/

# no problem
sc_data_1 <- searchConsoleR::search_analytics(
  siteURL         = address,
  startDate       = "2017-03-17",
  endDate         = "2017-03-17",
  dimensions      = c("date", 
                      "device", 
                      "page", 
                      "query"),
  searchType      = 'web', 
  rowLimit        = 6000, 
  prettyNames     = FALSE,
  aggregationType = "auto",
  walk_data       = NULL) #c("byBatch"))

# still no problem (changed date and decreased rowlimit to 5000)
sc_data_2 <- searchConsoleR::search_analytics(
  siteURL         = address,
  startDate       = "2017-03-18",
  endDate         = "2017-03-18",
  dimensions      = c("date", 
                      "device", 
                      "page", 
                      "query"),
  searchType      = 'web', 
  rowLimit        = 5000, 
  prettyNames     = FALSE,
  aggregationType = "auto",
  walk_data       = c("byBatch"))

# problem (the same date and rowlimit above 5000)
sc_data_3 <- searchConsoleR::search_analytics(
  siteURL         = address,
  startDate       = "2017-03-18",
  endDate         = "2017-03-18",
  dimensions      = c("date", 
                      "device", 
                      "page", 
                      "query"),
  searchType      = 'web', 
  rowLimit        = 6000, 
  prettyNames     = FALSE,
  aggregationType = "auto",
  walk_data       = NULL

Actual output

authetication

> searchConsoleR::scr_auth(token = token,
+                            new_user = FALSE)

2018-07-12 15:12:41> 
options(googleAuthR.scopes.selected=c('https://www.googleapis.com/auth/webmasters'))
options(googleAuthR.client_id='XXX')
options(googleAuthR.client_secret='XXX')
options(googleAuthR.webapp.client_id='XXX')
options(googleAuthR.webapp.client_secret='XXX')
2018-07-12 15:12:41> Reading token from file path
2018-07-12 15:12:41> Multiple httr-tokens in cache 'path', only returning first found token
2018-07-12 15:12:41> Token google_token$params$scope != getOption('googleAuthR.scopes.selected') 
#>Token: https://www.googleapis.com/auth/analytics https://www.googleapis.com/auth/webmasters https://www.googleapis.com/auth/analytics.readonly 
#>Option: https://www.googleapis.com/auth/webmasters

2018-07-12 15:12:41> Setting googleAuthR.scopes.selected to https://www.googleapis.com/auth/analytics https://www.googleapis.com/auth/webmasters https://www.googleapis.com/auth/analytics.readonly
2018-07-12 15:12:41> Token google_token$app$key != getOption('googleAuthR.client_id') 
#>Token: XXX 
#>Option: XXX

2018-07-12 15:12:41> Setting googleAuthR.client_id to XXX
2018-07-12 15:12:41> Token google_token$app$secret != getOption('googleAuthR.client_secret') 
#>Token: XXX
#>Option: XXX

2018-07-12 15:12:41> Setting googleAuthR.client_secret to XXX
Scopes: https://www.googleapis.com/auth/analytics https://www.googleapis.com/auth/webmasters https://www.googleapis.com/auth/analytics.readonly
App key: XXX
Method: filepath
>   
>   gsc_websites <- list_websites()
2018-07-12 15:12:41> Token exists.
2018-07-12 15:12:41> Request: https://www.googleapis.com/webmasters/v3/sites/
> 

no problem ("2017-03-17" and rowLimit above 5000)

> sc_data_1 <- searchConsoleR::search_analytics(
+   siteURL         = address,
+   startDate       = "2017-03-17",
+   endDate         = "2017-03-17",
+   dimensions      = c("date", 
+                       "device", 
+                       "page", 
+                       "query"),
+   searchType      = 'web', 
+   rowLimit        = 6000, 
+   prettyNames     = FALSE,
+   aggregationType = "auto",
+   walk_data       = NULL) #c("byBatch"))
Fetching search analytics for url: 'XXX' dates: 2017-03-17 2017-03-17 dimensions: date device page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byBatch
With rowLimit set to 6000 will need up to [2] API calls
2018-07-12 15:17:29> Batch API limited to [3] calls at once.
2018-07-12 15:17:29> Token exists.
2018-07-12 15:17:29> Token exists.
2018-07-12 15:17:29> Constructing batch request URL for: /webmasters/v3/sites/XXX/searchAnalytics/query
2018-07-12 15:17:29> Constructing batch request URL for: /webmasters/v3/sites/XXX/searchAnalytics/query
2018-07-12 15:17:29> Making Batch API call

still no problem (changed date to "2017-03-18" and decreased rowLimit to 5000)

> sc_data_2 <- searchConsoleR::search_analytics(
+   siteURL         = address,
+   startDate       = "2017-03-18",
+   endDate         = "2017-03-18",
+   dimensions      = c("date", 
+                       "device", 
+                       "page", 
+                       "query"),
+   searchType      = 'web', 
+   rowLimit        = 5000, 
+   prettyNames     = FALSE,
+   aggregationType = "auto",
+   walk_data       = c("byBatch"))
Fetching search analytics for url: URL dates: 2017-03-18 2017-03-18 dimensions: date device page query dimensionFilterExp:  searchType: web aggregationType: auto
2018-07-12 15:21:57> Token exists.
2018-07-12 15:21:57> Request: https://www.googleapis.com/webmasters/v3/sites/XXX/searchAnalytics/query
2018-07-12 15:21:57> Body JSON parsed to: {"startDate":"2017-03-18","endDate":"2017-03-18","dimensions":["date","device","page","query"],"searchType":"web","dimensionFilterGroups":[{"groupType":"and","filters":[]}],"aggregationType":"auto","rowLimit":5000}

problem ("2018-03-18" and rowLimit > 5000)

> # problem (the same date and rowlimit above 5000)
> sc_data_3 <- searchConsoleR::search_analytics(
+   siteURL         = address,
+   startDate       = "2017-03-18",
+   endDate         = "2017-03-18",
+   dimensions      = c("date", 
+                       "device", 
+                       "page", 
+                       "query"),
+   searchType      = 'web', 
+   rowLimit        = 6000, 
+   prettyNames     = FALSE,
+   aggregationType = "auto",
+   walk_data       = NULL) #c("byBatch"))
Fetching search analytics for url: URL dates: 2017-03-18 2017-03-18 dimensions: date device page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byBatch
With rowLimit set to 6000 will need up to [2] API calls
2018-07-12 15:25:06> Batch API limited to [3] calls at once.
2018-07-12 15:25:06> Token exists.
2018-07-12 15:25:06> Token exists.
2018-07-12 15:25:06> Constructing batch request URL for: /webmasters/v3/sites/XXX/searchAnalytics/query
2018-07-12 15:25:06> Constructing batch request URL for: /webmasters/v3/sites/XXX/searchAnalytics/query
2018-07-12 15:25:06> Making Batch API call
Error in if (s[length(s)] == "") s <- s[-length(s)] : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In split_vector(r, index) : No index found
2: In split_vector(x, index, remove_splits = FALSE) : No index found
3: In split_vector(x, index, remove_splits = FALSE) : No index found
4: In split_vector(x, index, remove_splits = FALSE) : No index found

Traceback

Traceback:
> traceback()
10: split_vector(x, index, remove_splits = FALSE)
9: unlist(split_vector(x, index, remove_splits = FALSE))
8: FUN(X[[i]], ...)
7: lapply(responses, function(x) {
  index <- c(1:2)
  unlist(split_vector(x, index, remove_splits = FALSE))
})
6: parseBatchResponse(req)
5: gar_batch(fl, ...)
4: FUN(X[[i]], ...)
3: lapply(limit_batch, function(y) {
  if (length(limit_batch) > 1)
    message("Request #: ", paste(y, collapse = " : "))
  fl <- lapply(y, function(x) {
    pars_walk_list <- lapply(pars_walk, function(z) z = x)
    names(pars_walk_list) <- pars_walk
    path_walk_list <- lapply(path_walk, function(z) z = x)
    names(path_walk_list) <- path_walk
    body_walk_list <- lapply(body_walk, function(z) z = x)
    names(body_walk_list) <- body_walk
    if (length(pars_walk) > 0)
      gar_pars <- modifyList(gar_pars, pars_walk_list)
    if (length(path_walk) > 0)
      gar_paths <- modifyList(gar_paths, path_walk_list)
    if (length(body_walk) > 0)
      the_body <- modifyList(the_body, body_walk_list)
    f(pars_arguments = gar_pars, path_arguments = gar_paths,
      the_body = the_body, batch = TRUE)
  })
  names(fl) <- as.character(y)
  batch_data <- gar_batch(fl, ...)
  if (!is.null(batch_function)) {
    batch_data <- batch_function(batch_data)
  }
  batch_data
})
2: googleAuthR::gar_batch_walk(search_analytics_g, walk_vector = walk_vector,
                               gar_paths = list(sites = siteURL), body_walk = "startRow",
                               the_body = body, batch_size = 3, dim = dimensions)
1: searchConsoleR::search_analytics(siteURL = address, startDate = "2017-03-18",
                                    endDate = "2017-03-18", dimensions = c("date", "device",
                                                                           "page", "query"), searchType = "web", rowLimit = 6000,
                                    prettyNames = FALSE, aggregationType = "auto", walk_data = NULL)

Session Info

In the beginning I used current versions of googleAuthR and searchConsoleR from CRAN. Changing to github version didn't solve the problem.

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
  [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250

attached base packages:
  [1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
  [1] doParallel_1.0.11         iterators_1.0.9           foreach_1.4.4             googleAuthR_0.6.3.9000
[5] searchConsoleR_0.3.0.9000 glue_1.2.0                lubridate_1.7.1           tidyr_0.7.2
[9] dplyr_0.7.5               RMySQL_0.10.13            DBI_0.8

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.16     bindr_0.1.1      magrittr_1.5     tidyselect_0.2.3 here_0.1         R6_2.2.2
[7] rlang_0.2.1      stringr_1.3.1    httr_1.3.1       tools_3.4.4      rprojroot_1.2    yaml_2.1.16
[13] assertthat_0.2.0 digest_0.6.15    tibble_1.3.4     bindrcpp_0.2.2   purrr_0.2.4      codetools_0.2-15
[19] curl_3.1         memoise_1.1.0    stringi_1.1.7    compiler_3.4.4   backports_1.1.2  jsonlite_1.5
[25] pkgconfig_2.0.1
MarkEdmondson1234 commented 6 years ago

Thanks, I think this may be an issue where date sequence only works for date ranges greater than 1, as byDate breaks up the API calls - it should be the case that there is nothing to gain by using a date range of one day and byDate.

Leszek-Sieminski commented 6 years ago

That's interesting. I tried the same code as above with walk_data = c("byBatch"). The result is the same, so it shouldn't be the case with "byDate".

This one day range is because the process tries to find missing data in database and download it. It tries to download data by each single day as I cannot be sure missing data will always be in a single period instead of "random" dates.

Nevertheless, I tried to download this data from a period to check if it solves the problem: from 2017-03-17 to 2017-03-19

> sc_data_4 <- searchConsoleR::search_analytics(
+   siteURL         = address,
+   startDate       = "2017-03-17",
+   endDate         = "2017-03-19",
+   dimensions      = c("date", 
+                       "device", 
+                       "page", 
+                       "query"),
+   searchType      = 'web', 
+   rowLimit        = 20000, 
+   prettyNames     = FALSE,
+   aggregationType = "auto",
+   walk_data       = NULL)

Fetching search analytics for url: XXX dates: 2017-03-17 2017-03-19 dimensions: date device page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byBatch
With rowLimit set to 20000 will need up to [5] API calls
2018-07-17 10:25:36> Request #: 0 : 5000 : 10000
2018-07-17 10:25:42> Request #: 15000 : 20000
Warning message:
No data found for supplied dates - returning NA 

However, it does not return similar number of rows. Quick check with table() returns:

> table(sc_data_4$date)

2017-03-17 2017-03-18 2017-03-19 
      3715        447       3151 

Today is the last day to check it (2017-03-17 is the oldest date in new Search Console) but as I see in Search Console there is more than 999 rows of data (queries + clicks & impressions) so it seems to be a mistake. I also tried with different rowLimit 's and different date ranges containing this 2017-03-18 but it always return rubbish (>500 rows). Any advice how to evade such problems?

kirchnerto commented 6 years ago

Hi @MarkEdmondson1234 - think I'm having the same issue or a similar one as @Leszek-Sieminski-PM had.

Hopefully you can find the issue on my side.

Code

#library(googleAuthR)
library(searchConsoleR)

## Authorize script with Google Developer Console.  
options("searchConsoleR.client_id" = "XXX")
options("searchConsoleR.client_secret" = "XXX")

## data in search console is reliable for 3 days ago so set start date = today - 3
## this is a single day pull so set end = start
start <- Sys.Date() - 65
end <- Sys.Date() - 35

## set website to your URL including http://
website <- "https://www.domain.com"

## what to download, choose between data, query, page, device, country
download_dimensions <- c('date','page','query')

scr_auth()

## this is the query to the search console API
searchquery <- search_analytics(siteURL = website,
                                startDate = start, 
                                endDate = end, 
                                dimensions = download_dimensions,
                                walk_data = c("byDate"))

## Specify Output filepath
filepath <-"J:/SearchConsole/Exports/"

## filename will be set like searchconsoledata_2016-02-08 (.csv will be added in next step)
filename <- paste("searchconsoledata", start, sep = "_")

## the is the full filepath + filename with .csv
output <- paste(filepath, filename, ".csv", sep = "")

## this writes the sorted search query report to full filepath and filename row.names=FALSE does not write dataframe row numbers
write.csv(searchquery, output, row.names = FALSE)

## Complete

Console Output

Fetching search analytics for url: https://www.domain.com dates: 2018-08-01 2018-08-31 dimensions: date page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-10-05 14:34:35> Request #: 2018-08-01
Error in if (s[length(s)] == "") s <- s[-length(s)] : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In split_vector(r, index) : No index found
2: In split_vector(x, index, remove_splits = FALSE) : No index found
3: In split_vector(x, index, remove_splits = FALSE) : No index found
4: In split_vector(x, index, remove_splits = FALSE) : No index found

Traceback (same as @Leszek-Sieminski-PM)

> traceback()
14: split_vector(x, index, remove_splits = FALSE)
13: unlist(split_vector(x, index, remove_splits = FALSE))
12: FUN(X[[i]], ...)
11: lapply(responses, function(x) {
        index <- c(1:2)
        unlist(split_vector(x, index, remove_splits = FALSE))
    })
10: parseBatchResponse(req)
9: gar_batch(fl, ..., batch_endpoint = batch_endpoint)
8: FUN(X[[i]], ...)
7: lapply(limit_batch, function(y) {
       if (length(limit_batch) > 1) 
           myMessage("Request #: ", paste(y, collapse = " : "), 
               level = 3)
       fl <- lapply(y, function(x) {
           pars_walk_list <- lapply(pars_walk, function(z) z = x)
           names(pars_walk_list) <- pars_walk
           path_walk_list <- lapply(path_walk, function(z) z = x)
           names(path_walk_list) <- path_walk
           body_walk_list <- lapply(body_walk, function(z) z = x)
           names(body_walk_list) <- body_walk
           if (length(pars_walk) > 0) 
               gar_pars <- modifyList(gar_pars, pars_walk_list)
           if (length(path_walk) > 0) 
               gar_paths <- modifyList(gar_paths, path_walk_list)
           if (length(body_walk) > 0) 
               the_body <- modifyList(the_body, body_walk_list)
           f(pars_arguments = gar_pars, path_arguments = gar_paths, 
               the_body = the_body, batch = TRUE)
       })
       names(fl) <- as.character(y)
       batch_data <- gar_batch(fl, ..., batch_endpoint = batch_endpoint)
       if (!is.null(batch_function)) {
           batch_data <- batch_function(batch_data)
       }
       batch_data
   })
6: googleAuthR::gar_batch_walk(search_analytics_g, walk_vector = walk_vector, 
       gar_paths = list(sites = siteURL), body_walk = c("startDate", 
           "endDate"), the_body = body, batch_size = 1, dim = dimensions)
5: search_analytics(siteURL = website, startDate = start, endDate = end, 
       dimensions = download_dimensions, walk_data = c("byDate")) at google_search_console.R#23
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("J:/SearchConsole/google_search_console.R", 
       echo = TRUE) 

Session Info

Latest searchConsoleR and googleAuthR from github, latest R Version

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] googleAuthR_0.6.3.9001    searchConsoleR_0.3.0.9000 timeDate_3043.102        

loaded via a namespace (and not attached):
 [1] httr_1.3.1       compiler_3.5.1   magrittr_1.5     R6_2.3.0         assertthat_0.2.0 tools_3.5.1     
 [7] curl_3.2         memoise_1.1.0    stringi_1.1.7    stringr_1.3.1    jsonlite_1.5     digest_0.6.17   
[13] openssl_1.0.2   
Leszek-Sieminski commented 6 years ago

Sorry for closing issue, misclick.

@kirchnerto I later discovered that my database misses 46 days of data from last 16 months because of this issue. This problem does not appear if you use Python instead (for example: https://moz.com/blog/how-to-get-search-console-data-api-python)

It seems to me that the problem is the googleAuthR helper function.

kirchnerto commented 6 years ago

@Leszek-Sieminski-PM Thanks for the tip! I still hope that @MarkEdmondson1234 has a clue, why this isn't working. I got back to older version of searchConsoleR and googleAuthR for now and extract that data on day-by-day basis without batching. Then I can get those 25000 rows I need.

MarkEdmondson1234 commented 6 years ago

I'll take a look, if I can get it to reproduce myself its a lot easier. I think its to do with some days not having data, so it needs to fail more gracefully. Does that sound possible?

Leszek-Sieminski commented 6 years ago

As I undestand this issue, better error handling would be nice, but the real problem is that the data is present and available traditionally and through API (I downloaded it with both PHP and Python just to check) but R somehow cannot download some dates.

MarkEdmondson1234 commented 6 years ago

It looks like it downloads it, but the merge fails.

MarkEdmondson1234 commented 6 years ago

The original issue looks like it is before the raise from 5000 to 25000 in the API response

Fetching search analytics for url: 'XXX' dates: 2017-03-17 2017-03-17 dimensions: date device page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byBatch
With rowLimit set to 6000 will need up to [2] API calls
2018-07-12 15:17:29> Batch API limited to [3] calls at once.

I suppose the issue should repeat though if you put in a rowLimit of 26000?

MarkEdmondson1234 commented 6 years ago

Hmm so the error rises here when parsing the batched responses metadata, not the data itself:

  responses_meta <- lapply(responses, function(x){
    index <- c(1:2)
    unlist(split_vector(x, index, remove_splits = FALSE))
  })

The API is sent through the batching service of Google, which lets you send many calls at once for faster response, e.g. it should now fetch 75000 rows per API call. The batch response is a split of all the separate API calls, however in this case no header information is being passed back, perhaps because those responses have no data at all.

MarkEdmondson1234 commented 6 years ago

If you can install the latest version of googleAuthR now, it has better error messaging and on fail will write the batch response to a RDS file if you have options("googleAuthR.verbose" = 2)

You can open that file with readRDS() and examine the object to see why the parsing is failing. I guess they are empty responses. Anyhow, please see if you can repeat the error and then make the .rds object available to me or print out its output (it will be large though if big fetch, so please edit down if possible)

kirchnerto commented 6 years ago

Hi @MarkEdmondson1234 - thanks for the fast response! I installed the latest versions of searchConsoleR and googleAuthR and run the script again using options("googleAuthR.verbose" = 2)

Console Output

> scr_auth()
2018-10-05 21:32:21> 
options(googleAuthR.scopes.selected=c('https://www.googleapis.com/auth/webmasters'))
options(googleAuthR.client_id='858905045851-3beqpmsufml9d7v5d1pr74m9lnbueak2.apps.googleusercontent.com')
options(googleAuthR.client_secret=' bnmF6C-ScpSR68knbGrHBQrS')
options(googleAuthR.webapp.client_id='858905045851-iuv6uhh34fqmkvh4rq31l7bpolskdo7h.apps.googleusercontent.com')
options(googleAuthR.webapp.client_secret=' rFTWVq6oMu5ZgYd9e3sYu2tm')
Scopes: https://www.googleapis.com/auth/webmasters
App key: 858905045851-3beqpmsufml9d7v5d1pr74m9lnbueak2.apps.googleusercontent.com
Method: new_token

> ## this is the query to the search console API
> searchquery <- search_analytics(siteURL = website,
+                                 startDate = st .... [TRUNCATED] 
Fetching search analytics for url: https://www.domain.de dates: 2018-08-01 2018-08-31 dimensions: date page query dimensionFilterExp:  searchType: web aggregationType: auto
Batching data via method: byDate
Will fetch up to 25000 rows per day
2018-10-05 21:32:22> Batch API limited to [1] calls at once.
2018-10-05 21:32:22> Request #: 2018-08-01
2018-10-05 21:32:22> Token exists.
2018-10-05 21:32:22> Constructing batch request URL for: /webmasters/v3/sites/https%3A%2F%2Fwww.domain.de/searchAnalytics/query
2018-10-05 21:32:22> Making Batch API call
Error in value[[3L]](cond) : 
  Error with batch response - writing response to C:\Temp\RtmpyYqTyu\file298c37f072af.rds

I attached the .rds-file to this comment. Yeah - it's possible because the output is empty somehow. Hope this helps for debugging. file298c37f072af.zip

MarkEdmondson1234 commented 6 years ago

Ok well thats weird, the file works when I do it. Hmm, I hope its not a Windows thing.

kirchnerto commented 6 years ago

@MarkEdmondson1234 What do you mean by saying the file works when I do it?

One thing I noticed: The request only crashes when a lot of dimensions are set which lead to a hell of processing I guess. When just using the dimensions "date" and "query" or "date" and "page" it's working fine but crashes when I want to have all 3 dimensions. FYI: The result of leads to ~20k rows when requesting only one day . Any Ideas on this?

kirchnerto commented 6 years ago

@MarkEdmondson1234 Anything new on this problem?

MarkEdmondson1234 commented 6 years ago

When I loaded the .rds file it parsed without error on my machine, I don't know why it would work for me and not you unless its a Windows specific problem I can't easily test (I really hope it is not that). I need to be able to reproduce the problem to have a hope for fixing it. The amount of data should not be a problem unless you are running on a very very small machine? What is your RAM?

kirchnerto commented 6 years ago

@MarkEdmondson1234 I'm running on Windows 10 with intel i5 and 8GB of RAM - should be enough I guess ;)

Leszek-Sieminski commented 6 years ago

@MarkEdmondson1234 I'm running on Windows 10, intel i5, 8GB RAM (for development), but I started the issue after discovering missing data that was downloaded on server (Debian, 32 GB RAM). So it probably isn't related to OS, not sure about the RAM.

MarkEdmondson1234 commented 6 years ago

That should be plenty. Sorry I have no clue at the moment as it’s working on my test suite and locally.

flopont commented 6 years ago

Hello @MarkEdmondson1234, I have been experiencing the same issue as described above. I am also running R on Windows (inside RStudio), and I believe I can confirm that this is a Windows specific problem.

I have tried to lower the value of rowLimit below 5,000, and also tried setting walk_data to byDate rather than byBatch. In every case, my exports would end up failing with the same error, as described by OP.

However, since you mentioned that this could be a Windows specific problem, I tried running the exact same scripts using the rocker/verse image in Docker, and there you go: I never got any error and am now able to export all the data I need!

I hope this helps. Many thanks for your work.

MarkEdmondson1234 commented 6 years ago

Thanks @flopont thats very helpful. I will look to update using the latest googleAuthR tools that may help solves this.