JoMingyu / google-play-scraper

Google play scraper for Python inspired by <facundoolano/google-play-scraper>
MIT License
757 stars 206 forks source link

[BUG] Incorrect response with empty/wrong fields #122

Closed SprutSDM closed 2 years ago

SprutSDM commented 2 years ago

google_play_scraper.VERSION 1.0.5

Describe the bug Sometimes the library returns the incorrect response with empty/wrong fields.

Code google_play_scraper.app("com.android.chrome", lang="en", country="us")

Response:

{
  'title': [],
  'description': None,
  'descriptionHTML': None,
  'summary': None,
  'summaryHTML': None,
  'installs': None,
  'minInstalls': None,
  'score': None,
  'ratings': None,
  'reviews': None
  ...
  'appId': 'com.android.chrome',
  'url': 'https://play.google.com/store/apps/details?id=com.android.chrome&hl=en&gl=us'
}

image

Additional context I tested this with different applications and it turned out that it's not application dependent. It started on May 12th. I think Google Play is testing some feature as part of A/B testing.

rentheduke commented 2 years ago

Same issue as mentioned by @SprutSDM; library returning incorrect responses arbitrarily.

kuwapa commented 2 years ago

Hello,

I was facing this same issue happening on an irregular basis. On fetching the same app again after a while it often worked and sometimes it didn't. Initially I thought this error was because of some rate limiting by google wherein the webpage was showing some captcha or maybe returning a 429 Too many requests response code?. So I started just adding a delay to the script after I got a response with null values in it. However, I dug in a little deeper.

I added the following code in request.py.

...
def _urlopen1(obj):
    try:
        resp = urlopen(obj)
        print("response_code = " + str(resp.code))
        package_name = obj.replace("https://play.google.com/store/apps/details?id=", "").replace("&hl=en&gl=us",
                                                                                                 "").replace(".", "_")
        text_file = open("/Users/abhimanyu/Projects/PlayStoreScreenshots/testfiles/" + package_name + ".html", "w")
        text_file.write(str(resp.read()))
        text_file.close()
        filename = 'file:////Users/abhimanyu/Projects/PlayStoreScreenshots/testfiles/' + package_name + ".html"
        webbrowser.open_new_tab(filename)
        # print(str(resp.read()))
        resp = urlopen(obj)
    except HTTPError as e:
...

So now as I looped through many apps, I was seeing the response code for each request and also each request was saving the raw webpage on the file system and opening it in the browser so that I could see what exactly was google's server's returning.

I noticed 3 things.

  1. On the apps where the response had null values, the response code was still 200 meaning that was not due to rate limiting by Google.
  2. A new design? I thought this was the cause of the issue? A new webpage that perhaps Google is AB testing as @SprutSDM mentioned? But this was not the cause of the issue. Even these new looking webpages were being parsed fine and not retuning null values.
  3. Regular looking webpages but with weird data. These are the pages which were returning null values. I've attached a screenshot of this. I'm not yet sure why is this happening and how can it be fixed. Perhaps @89z can shed some light on this? How did you fix this in your own repository?

New design (not source of problem)

Screen Shot 2022-05-23 at 12 55 36 AM

Webpage with weird text (Source of problem)

Screen Shot 2022-05-23 at 12 42 14 AM

I hope this helps shed more light on the issue and can help in fixing the issue sooner. Thanks.

gabrielfiorelli commented 2 years ago

Now, all my calls are returning Nulls. Is this also happening to you?

rentheduke commented 2 years ago

I haven't checked but the new UI is up so this is probably what is causing it.

exbein commented 2 years ago

i have the same problem ,all return Nulls

JoMingyu commented 2 years ago

It fixed at version 1.1.0. It really hard to fix. I need to make a data path finder.

related commit: https://github.com/JoMingyu/google-play-scraper/commit/d53bbf383b93a5e9ece0a22eae9176449a646227