MarkEdmondson1234 / searchConsoleR

R interface with Google Search Console API v3, including Search Analytics.
http://code.markedmondson.me/searchConsoleR/
Other
114 stars 41 forks source link

Not able to extract full data from this code #45

Closed jshranik closed 6 years ago

jshranik commented 6 years ago

What goes wrong

I have following data 5.42mn clicks & 174mn impressions for a month when i check on https://search.google.com/search-console but from API i get only 16627737 impressions and 520753 clicks by using batch method while 16521168 impressions & 519630 clicks using by date method I want to know if its possible to get full data

Steps to reproduce the problem

Runned via mentioned step

get the search analytics data

data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type, walk_data = "byDate")

Expected output

5.42mn clicks & 174mn impressions

Actual output

I got only 16627737 impressions and 520753 clicks by using batch method while 16521168 impressions & 519630 clicks using by date method

Session Info

R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] searchConsoleR_0.3.0

loaded via a namespace (and not attached): [1] googleAuthR_0.6.3 httr_1.3.1 compiler_3.5.1 R6_2.2.2
[5] assertthat_0.2.0 tools_3.5.1 curl_3.2 memoise_1.1.0
[9] jsonlite_1.5 digest_0.6.15 openssl_1.0.1

MarkEdmondson1234 commented 6 years ago

This is to do with the API limitations, which will return different results depending on what you query. For instance, if you query looking for 1000 rows it will return say 988 rows, but if you query the same dimensions looking for 2000 rows it will return 1567. This means that when using batch and by date you get different numbers. Looking at your numbers you are doing better than most with just a ~3% difference.

Google suggests if you want accuracy to not include dimensions, but if you want detail to include dimensions. See this guide: https://developers.google.com/webmaster-tools/search-console-api-original/v3/how-tos/all-your-data

For accurate counts, you must omit the page and query dimensions, ....

For greater detail, including page and/or query information, at the expense of losing some data

Hope that helps, Mark

jshranik commented 6 years ago

Hi Mark ,

I am getting only 5k data when request for 1 day even when requesting for more than 25k rows

I want to know you have shown that you have updated getData.R file but how should i make those changes in my system

MarkEdmondson1234 commented 6 years ago

If you install the GitHub version it increases daily feteches to 25000. Install via remotes::install_github(“MarkEdmondson1234/searchConsoleR)”

jshranik commented 6 years ago

Yes it increased the API hit limit to 25k but by data, count is still coming to 5k while having much more data please refer screenshot I am extracting for only 1 day image

image image

image

jshranik commented 6 years ago

I think this issue is related to one of your open thread which is still not closed

https://github.com/MarkEdmondson1234/searchConsoleR/issues/12

MarkEdmondson1234 commented 6 years ago

Can you do the same API call but with no dimensions, and see how that compares to the UI?

jshranik commented 6 years ago

Yes after removing dimension i am getting matching data,please refer screensho image

MarkEdmondson1234 commented 6 years ago

Ok so its the issue above, adding dimensions in the API lowers the accuracy. There isn't any code I can change to alter this unfortunately.

jshranik commented 6 years ago

image I am getting N/A in the data too

jshranik commented 6 years ago

I think you are not using the startRow functionality due to which in last we are only able to get same result everytime

MarkEdmondson1234 commented 6 years ago

It uses startRow here:

https://github.com/MarkEdmondson1234/searchConsoleR/blob/56afa4cf96a11dd1dbab79edf694da22b2a4646d/R/getData.R#L272-L278

jshranik commented 6 years ago

Hey thanks for revert I am getting following error related to quota exceeded but when i check in my console of google cloud it shows very less API hit, Don't know how i crossed by quota image

MarkEdmondson1234 commented 6 years ago

Hmm looks like you are hitting these new load quota limits: https://developers.google.com/webmaster-tools/search-console-api-original/v3/limits

You are using the default Google console project that comes with the library (Appkey: 8589005045851-2 etc) so sharing resources with others using the library, I would advise authenticating with your own Google project as you are using heavy loads, then back off the API calls below the Google guidelines (e.g. wait 15 mins if you trigger a load quota error)

To use your own clientID, see ?googleAuthR::gar_set_client() and authenticate using googleAuthR::gar_auth() rather than scr_auth()