BaseMax / GooglePlayWebServiceAPI

Tiny script to crawl information of a specific application in the Google play/store base on PHP.
MIT License
37 stars 9 forks source link

Playstore redesign #20

Closed IzzySoft closed 2 years ago

IzzySoft commented 2 years ago

The recent PlayStore redesign broke the parser nearly completely (apart from the app name, no fields had any data). I've fixed up most things with b837c529c10063a3efc8c32f338ace37bfcd89d3 (hopefully), but some things seem no longer available (e.g. the versionName of the last release, or the minimum Android version required).

@BaseMax would be great if you could check whether my changes work for you, and if you can find some of the missing details. I've left some comments in the code so you should be able to easily identify them.

Oh, and to make up a bit for lost details, I've added the video URL (what fastlane has as video.txt)…

spiritwebdev commented 2 years ago

Yeah, version didn't work and developer name too.

UPD: whats new - not parsing too

BaseMax commented 2 years ago

Thank you @IzzySoft, Your job is good always. You are always at the forefront of this.

@spiritwebdev Thanks for testing.

IzzySoft commented 2 years ago

Yes, thanks for testing! And apologies for my pushing to main directly, but the last version wasn't working at all anymore so I thought it's an improvement anyway, imperfect as it might be :rofl: Seems I was right with that, good.

OK, so things still not (yet/anymore/ever) working are

What I did not find yet I might once more check protobuf for – though as the new page design does not show those details either, I have my doubts to find them. Things I did not yet check:

Looks like we should think about some "unit testing" – say having a list of package names where we know what is returned, run against them, and compare the results. Might save us some headache.

You are always at the forefront of this.

This library is extensively used here, so I just notice quite early when things got broken. One day might be a fluke, but a couple of are a problem. The main standing problem for me currently is the missing versionName, as that's used as conditional in some places (e.g. to decide whether Exodus returned results for the very same version, IIRC).

If I find more, I'll push again. If one of you finds more, please yell (to avoid merge conflicts :wink:) If you think I should rather try filling remaining gaps in a dedicated branch, just say so @BaseMax.

IzzySoft commented 2 years ago

@spiritwebdev nice try. OK, now there is a whatsnew (as soon as I've pushed my changes). We never had it before. Sneaky :rofl:

IzzySoft commented 2 years ago

Next fixes pushed: ads & iap were broken as well, as a test showed; price should still work. Hopefully fixed now (more tests welcome). As suggested by @spiritwebdev "whatsnew" was added.

Just listing that as we might need to update documentation afterwards.

So far for the good news. Next the bad ones: I couldn't find the other 3 details (version, minSDK, APK size) anywhere in the HTML or the protobuf. If someone has an idea (or found what I missed), please report it here.

To close my comment with a good message: As I tried skimming all protobuf carefully, I found 3 other things there (might partly be in plain HTML as well, I did not yet check that). We could parse that along with parseApplication() but not return it with the result array (to not bloat that), rather storing it into a class property and retrieving it with another method (which, if the property is null, would call parseApplication first to set things up):

As these are new additions, maybe they should be split off to separate issues (at least the last 2). This issue should better focus on how to fix things left broken:

As Appbrain still has those fields (and with data – though I didn't check for the latest releases; the newest one I checked was 2022-05-23; found 1 from 2022-05-24 which at least had version and app size), there must be a way to obtain them. Or not: one from today shows N/A for size… Ah, good: 2022-05-26 (yesterday) still has all details. OK, so waiting for your ideas then :smile:

spiritwebdev commented 2 years ago

How about developer name?

spiritwebdev commented 2 years ago

version now in full description

IzzySoft commented 2 years ago

How about developer name?

Array
(
    [packageName] => meditofoundation.medito
    [name] => Medito: Meditation & Sleep
    [developer] => Medito for Mindfulness, Meditation and Sleep
    [category] => 
    [type] => 
...

Looks like it's in different places with different apps? I had that already fixed (as the example shows). Argh, wtf… they use different URLs: with one app it's /store/apps/dev?id=, with the other /store/apps/developer?id=. OK, adjusted the RegEx accordingly, should now work with both.

While on it, also fixed category (hopefully; similar game: needs different RegEx depending on whether it's shown as "label" or hidden somewhere else).

version now in full description

Ugh? Where? You don't confuse that with the "Updated on" date, do you?

spiritwebdev commented 2 years ago

gp They hide it here now :(

Developer name works like charms

IzzySoft commented 2 years ago

They hide it here now :(

Yeah, that's loaded via Ajax onClick. I watched the traffic, it's causing 2 calls to their reCaptcha API (huh?) but I could not yet figure where the content comes from, or how to construct the call. I just discovered the privacy details can be retrieved separately via a special URL (https://play.google.com/store/apps/datasafety?id=${packageName}), though they are also part of the main page – so there might be some similar URL to retrieve those app details as well. Using details instead of datasafety gives the main page (which is what we already call), about and data yield a 404. I tried a few more, but no success so far – ideas welcome.

We already use https://play.google.com/_/PlayStoreUi/data/batchexecute to retrieve permissions, so I did a search on that URL. Maybe we can pick something from here – it's Go code, but the parameters (mostly f.req) we should be able to construct from there. Took a bit playing around (we could use that to retrieve all details without RegEx parsing if we had the protobuf definitions, but without those it's just wild guessing which 0/1/2 values belong to what variable), but finally I got the version stuff that way.

Developer name works like charms

That's good to read, thanks for testing it out! Hope you can say the same about version details with my last commit. Waiting for your feedback then: is all complete again now?

IzzySoft commented 2 years ago

OK, running a mass update looks like it works. I've bumped the version and adjusted the "log" accordingly. Wanna tag it, @BaseMax – so folks don't pull a broken 1.0.0? New functionality and niceties can be done after that, and then might justify increasing "minor" (I just increased "patch" as it's a fully compatible bug fix release; if you wonder about the names: semantic versioning, major.minor.patch :smile:)

IzzySoft commented 2 years ago

@spiritwebdev if you like to do some more testing, I've just opened a PR to add another method to the class, obtaining the new "data safety" details for a given app. Feedback always welcome: the more eyes approve, the better the test coverage (and our confidence to letting it lose on the world, i.e. merging it to main) :smiley:

IzzySoft commented 2 years ago

@BaseMax shall we tag? Will you tag? Shall I tag? As it's mainly fixing the now broken v1.0.0 I'd suggest v1.0.1. That done, this issue could be closed.

BaseMax commented 2 years ago

@BaseMax shall we tag? Will you tag? Shall I tag? As it's mainly fixing the now broken v1.0.0 I'd suggest v1.0.1. That done, this issue could be closed.

Do you mean to release a new version? Okay good, do it. @IzzySoft

IzzySoft commented 2 years ago

Done, thanks @BaseMax! Now shall we merge #21? Maybe you could take a quick look if you confirm? Or should I just "go ahead"?