JoMingyu / google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>
MIT License
757 stars 206 forks source link

Unable to download all reviews from an app[BUG] #147

Open Gautamshahi opened 2 years ago

Gautamshahi commented 2 years ago

Hello,

I am trying to download all the reviews, but the code is working only for 178 reviews.

lang="jp" country="jp" id="jp.go.mhlw.covid19radar"

app_reviews = reviews_all( id, sleep_milliseconds=0, country=country count=200 )

How to modify the code to download more reviews?

riazspace commented 1 year ago

I am facing the same issue. I am trying to download all the reviews but it only download 2K reviews out of 6.14K reviews. The app link is given below https://play.google.com/store/apps/details?id=com.metro.metroestore

pspot2 commented 1 year ago

From the tests I ran (please correct me if I'm wrong), it looks like some of the reviews can be assigned a different language than the language of the Play Store country where your app is published. E.g. if you query the API using lang="jp" and country="jp", it will return only those reviews that are labelled as lang="jp" internally. There are many more reviews (labelled with other languages) which are left out. The funny thing is that the language of the body of the review (e.g. the actual text) is not necessarily the language tag the review was assigned. There could be reviews with Japanese text tagged as en language.

In order to get the reviews for all languages.... well, this is where we run into a problem:

Looks like the URL query parameters hl and gl (corresponding to language and country variables) are mandatory for the batchexecute API in a sense that otherwise you don't get the expected result back.

What you expect when omitting the language: reviews for all languages What you get when omitting the language: only those reviews which are labelled with no language (this is my suspicion, I don't know how languages are handled internally in Play Store)

Now, it looks like the only way to get reviews for all languages is to iterate through the whole list of languages that Google API supports (e.g. run the script once for every language and concatenate the results). I do realize that this is hugely impractical and I sincerely hope that there is a way of specifying "all languages" somehow, but it doesn't seem to be the limitation of this library. It is rather the limitation of the executebatch API.

If someone has more details on this (e.g. according to which logic are reviews tagged with languages and how to tell the API to return reviews for all languages without having to iterate all of them), I'd be keen to hear them.