JoMingyu / google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>
MIT License
769 stars 212 forks source link

Getting All Reviews - Sorted Newest #50

Closed ManuelBanza closed 4 years ago

ManuelBanza commented 4 years ago

Hi,

I need some help with extracting reviews from an app.

I have 2 problems:

  1. When I try to get the newst reviews from an app it return not all of the newest. For example I can get 3 reviews from yesterday, but when I go to the play store I can see that there were 15 reviews on that day. Also it only returns 100 reviews code: result= reviews( '', lang='pt', country='pt', sort=Sort.NEWEST, count=10000, )

result, _ = reviews( '', )`

  1. I would like to get all of the reviews at once, for example the app that I am trying to scrape has almost 5000 reviews. But when I scrape it, I only get 1500 and they are from 2019 or before that. code:

`result, continuation_token = reviews( '', lang='pt' country='pt' sort=Sort.NEWEST count=1500 )

result, _ = reviews( '', continuation_token=continuation_token # defaults to None(load from the beginning) )`

Am I doing something wrong? Call you help? Thanks!

JoMingyu commented 4 years ago

I'm sorry for inconvenience :( Can you please give me app_id you used? It helps my debugging.

ManuelBanza commented 4 years ago

hi @JoMingyu thank you so much!

I am using portuguese apps, some examples:

Once again thank you for the help and congrats for this super helpfull library :)

JoMingyu commented 4 years ago

I am checking. Sorry for being late. I will release the updated version to at least 2020-05-31 20:00:00 KST or comment on the issue.

JoMingyu commented 4 years ago

Hi @ManuelBanza .

TL;DR : It seems like non-issue.

I was able to reproduce the problem in the process of loading a review of an app that corresponds to 'pt.nos.selfcare' among the app_ids I delivered.

Here is what I tested:

  1. When the function is called with count as 10000, actually 10000 reviews are fetched.
  2. After removing duplicates based on reviewId in each item of this result, there are 10000 review items.

The following is the code I wrote for testing.

from google_play_scraper import reviews, Sort

for app_id in (
    "cgd.pt.caixadirectaparticulares",
    "pt.nos.selfcare",
    "eu.hboportugal.android",
):
    result, continuation_token = reviews(
        app_id, lang="pt", country="pt", sort=Sort.NEWEST, count=10000,
    )

    assert len(result) == 10000
    assert len(set([r["reviewId"] for r in result])) == 10000

When app_id was set to 'pt.nos.selfcare', AssertionError occurred, and the total number of reviews retrieved was 4314.

In fact, if you look at the information of the app in the Play Store, you can see that about 15,000 reviews are reflected in the star rating as shown in the following picture.

스크린샷 2020-05-29 오후 8 19 36

So I learned about the review function of the Play Store, and realized leaving content in the review was optional. Reviews without content (with stars only) are not shown in the review list. Therefore, it is because only 4314 reviews were loaded, since only the stars were excluded.

Therefore, this problem does not appear to be a library issue. I may have misunderstood your problem, so if so please tag me by attaching the problem code and expected result and actual result to the issue comment.

JoMingyu commented 4 years ago

I'll close this issue because it seems to be solved. If any additional problems occurred, reopen this issue or open new issue.