danieliu / play-scraper

A web scraper to retrieve application data from the Google Play Store.
MIT License
234 stars 103 forks source link

Double encoding in search. #66

Open silsebastian opened 4 years ago

silsebastian commented 4 years ago

Description: If search contains special characters, it will be encoded twice: the first one due to the quote_plus library and the second one is done by Google's servers. It is easily solved by removing quote_plus from parameters. That is, having:

        self.params.update({
            'q': query,
            'c': 'apps',
        })

instead of:

        self.params.update({
            'q': quote_plus(query),
            'c': 'apps',
        })

To Reproduce The input will be the developer web of Instagram: https://help.instagram.com/ When running res = play_scraper.search("https://help.instagram.com/", detailed=True) Play_scraper does the following query: /store/search?q=https%253A%252F%252Fhelp.instagram.com%252F&c=apps&gl=us&hl=en Which is not right. If we look for that url in a browser, a query with encodings is written automatically in the searchbox:

Screenshot from 2020-01-27 10-37-02

Expected behavior If we manually put https://help.instagram.com/ in the searchbox, the url will be: https://play.google.com/store/search?q=https%3A%2F%2Fhelp.instagram.com%2F&c=apps&hl=en&gl=us

If we use the piece of code without quote_plus, the url that is searched for is exactly the same as the desired one.

Desktop (please complete the following information):

milekey commented 4 years ago

Thank you silsg, your suggestion is quite exact. I was troubled by seem-to-be-randomly-returned-results when I searched with multi-byte query word. I found your comment then I downloaded this library and patched line 230 in scraper.py as you said, the error has gone!