facundoolano / google-play-scraper

Node.js scraper to get data from Google Play
MIT License
2.34k stars 631 forks source link

Reviews from playstore seem to repeat after 6333 records #467

Closed ekta1007 closed 3 years ago

ekta1007 commented 3 years ago

issue

Here's what I mean

4603 - review1
10963- review1 repeats
17269- review1 repeats
23602 - review1 repeats
2330 - review2
8563- review2 repeats
14896 - review2 repeats
21229 - review2 repeats

You can take any app and check this, so long as the total number of records > 10K , since the duplicates only show up at intervals of 6333

I have two records and I used after getting the continuation_token from the 1st call

first call

result, continuation_token = reviews(
            app,
            lang='en',  # defaults to 'en'
            country='us',  # defaults to 'us'
            count=step_fetch  # defaults to 100
        )

subsequent call

result, continuation_token = reviews(
            app,
            continuation_token=continuation_token, count=step_fetch )

continuation_token before

I verified that I was getting the tokens right.

continuation_token.token
CoMBCoABKm8KOffsZIO6_9gB_Jielp7Fz8_Pz8_Px53NnsrMzs_MzcXOxYmSxcnLycfGzMfNyMnMyc7KycbMzMn__hCI0wIh2eyTvIBtj24xYgxFNcw2K3U5A_4nAEV8mxNQAFoLCcFo4g6gp4HXEANgpPaiygEyDQoLCgAo1qrF-vTc4wI
continuation_token after  CoMBCoABKm8KOffsWhxx__5kTpielp7Fz8_Pz8_PnsvMyprOzpnLxsXOxYmSxcnLycfGzMfNyMnMyc7KycbMzMn__hDQ1AIh2eyTvIBtj24xqDED9JdGVMI5sZsBAI7jpRNQAFoLCcFo4g6gp4HXEANgpPaiygEyDQoLCgAooKmrhonJ4wI 

Here's what I mean by a review object

 {
        "reviewId": "gp:AOqpTOHKCC_Wt1r0Py35QbMcfHvVlYDl6HK4OujKjqPdFywxh8OvJURLgFpgwI2SZP_6or5oTxQshiSATt2wmQ",
        "userName": "devi hoei sunarya",
        "userImage": "https://play-lh.googleusercontent.com/a-/AOh14Gh4f73MCqTu27eLX4Iml7Zv5njnU845icYF98NPDA",
        "content": "Apps ini membantu saya mendaftar pengeluaran. Tapi mungkin bisa lebih diperbagus lagi di fitur dimana pengeluaran dalam laporan dikelompokkan berdasarkan kategori yg diberikan, ex: food&drink, clothes, entertainment, hobies etc. Tidak hanya dari nama barangnya atau jenis pembeliannya. Jadi dari situ membantu kita melihat dalam kategori apa pengeluaran terbesarnya.",
        "score": 5,
        "thumbsUpCount": 0,
        "reviewCreatedVersion": "0.34.0",
        "at": "2021-05-11T02:13:07",
        "replyContent": null,
        "repliedAt": null,
        "content_translated": ""
    },
ekta1007 commented 3 years ago

This was my bad. The reviews on top of Android playstore over count the "thumbs app" Closing the issue., hence the continuation token recycles as there things to not scrape anymore.