facundoolano / google-play-scraper

Node.js scraper to get data from Google Play
MIT License
2.28k stars 625 forks source link

Four collections are invalid for every category: TOP_FREE_GAMES, TOP_PAID_GAMES, NEW_FREE_GAMES, NEW_PAID_GAMES #395

Open TravisWhitehead opened 4 years ago

TravisWhitehead commented 4 years ago

Description:

It appears that four of the collections are invalid for every category. Every category I try for these collections triggers this error: https://github.com/facundoolano/google-play-scraper/blob/2230e30652411697f6474dbe2a2090fd034a9a2e/lib/utils/parseCategoryApps.js#L51

Example code:

Run the below using node bad_collection.js | tee output (tee is optional for saving output to file).

You will see the error for every single category for each of these collections.

To quickly validate that there was not a single successful query: do cat output | grep SUCCESS. You can also do cat output | grep FAIL | wc -l and note that there are 236 failures. There are 59 categories and 4 collections, so 59 * 4 = 236 (all combos fail).

'use strict';

const gplay = require('google-play-scraper')

const bad_collections = [
  gplay.collection.TOP_FREE_GAMES,
  gplay.collection.TOP_PAID_GAMES,
  gplay.collection.NEW_FREE_GAMES,
  gplay.collection.NEW_PAID_GAMES
]

for (const collection of bad_collections) {
  console.log('Trying to list all categories for collection: ' + collection)

  for (const category of Object.values(gplay.category)) {
    gplay.list({
      category: category,
      collection: collection,
      // 200 is maximum Google Play will give us per call
      num: 200,
      throttle: 10
    })
    .then((result) => {
      console.log('SUCCESSFULLY got apps from category ' + category + ' in collection ' + collection)
    }, (err) => {
      console.log('FAILED to get apps from category ' + category + ' in collection ' + collection)
      console.log(err)
    })
  }
}

Error message:

This is the output I get when running the above example code: https://gist.github.com/TravisWhitehead/545b6f4737b527b0196059a84d6cf7d9

Depending on the collection used:

Error: The collection topselling_new_paid_games is invalid for the given category, top apps or new apps
Error: The collection topselling_new_paid_games is invalid for the given category, top apps or new apps
Error: The collection topselling_free_games is invalid for the given category, top apps or new apps
Error: The collection topselling_paid_games is invalid for the given category, top apps or new apps
icarcal commented 4 years ago

@TravisWhitehead you are right These collections work only without any category. For example, the following codes are valid:

.list without categories examples

```javascript gplay.list({ collection: gplay.collection.TOP_FREE_GAMES, num: 10, }) gplay.list({ collection: gplay.collection.TOP_PAID_GAMES, num: 10, }) ```

That's because these collections access the main Google Play Top page: https://play.google.com/store/apps/top/?hl=en&gl=us

Google Play Top Main Page Screen Shot 2020-06-16 at 23 30 07

When you add any category, it tries to access the Google Play Top category page for example: https://play.google.com/store/apps/top/category/ENTERTAINMENT?hl=en&gl=us

.list with categories example

```javascript gplay.list({ collection: gplay.collection.TOP_FREE_GAMES, category: gplay.category.ENTERTAINMENT, num: 10, }) ```

And that page have no "Top Free Games" cluster, raising the error that you mentioned:

Google Play Top ENTERTAINMENT Main Page Screen Shot 2020-06-16 at 23 36 47

The same goes for NEW_FREE_GAMES and NEW_PAID_GAMES

Maybe this should be documented inside the README