MarkEdmondson1234 / googleAuthR

Google API Client Library for R. Easy authentication and help to build Google API R libraries with OAuth2. Shiny compatible.
https://code.markedmondson.me/googleAuthR
Other
175 stars 54 forks source link

Exponential backoff does not follow the same algorithm defined by Google #186

Open octaviancorlade opened 4 years ago

octaviancorlade commented 4 years ago

What goes wrong

Recently the Google Analytics Reporting API V4 started returning a lot of 503 errors for some views, for which it expects exponential backoff to be used as described here i.e. at each attempt to sleep for 2^attempt + random(0, pause_base) seconds. As an example, at the 3rd retry this would be 8 + random(0, 1) seconds.

Instead the httr::RETRY call does "randomly wait between 0 and pause_base * 2 ^ attempt seconds", which at the 3rd retry is random(0, 8), always smaller than the sleep expected by the Google API. This results in errors of the form

2020-07-13 08:11:07> All attempts failed.
Error: API returned: Quota Error: Number of recent failed reporting API requests is too high, please implement exponential back off.

Session Info

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] httr_1.4.1             googleAuthR_1.2.1      googleAnalyticsR_0.7.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     tidyr_1.0.0      crayon_1.3.4     digest_0.6.25   
 [5] dplyr_1.0.0      assertthat_0.2.1 R6_2.4.1         lifecycle_0.2.0 
 [9] jsonlite_1.6.1   magrittr_1.5     pillar_1.4.4     rlang_0.4.6     
[13] fs_1.4.1         ellipsis_0.3.1   vctrs_0.3.1      generics_0.0.2  
[17] tools_3.6.3      glue_1.4.1       purrr_0.3.4      compiler_3.6.3  
[21] pkgconfig_2.0.3  gargle_0.5.0     memoise_1.1.0    tidyselect_1.1.0
[25] tibble_3.0.1 
MarkEdmondson1234 commented 4 years ago

It looks like a fix won't be until we see httr2 so for now we can revert back to the googleAuthR retry code via these options:

options(googleAuthR.tryAttempts = 3,
        googleAuthR.HttrRetryTimes = 1)

Once those are set can we test to see if it helps the issue?

octaviancorlade commented 4 years ago

I'm not sure about other services that would use googleAuthR, some links mentioned in the httr issue suggest capping at 32 or 64 seconds for the cloud storage API, but for Google Analytics it should work with tryAttempts = 5 i.e. one requests plus 5 retries in exponential backoff. That is documented here for v4 and here the same for v3

We used this now with tryAttempts = 5 and that helped with that error, have been tested since I opened this issue, we now have more often 503 errors but that's probably a different topic: before this, requests that returned 503 were retried until a quota error was raised, so we could not easily notice them.

As a note since it has been a bit unintuitive HttrRetryTimes defines the total number of requests that httr performs, while tryAttempts defines the number of retry times after the first one in case of errors.

MarkEdmondson1234 commented 4 years ago

Cool yes the 503 is related to the error that started the retries. I guess you need to use slow_fetch=TRUE and/or decrease the page size to avoid it. The options above can be set on a library basis on load so for googleAnalyticsR for example will set them to what you suggest above.