Willy-JL / F95Checker

GNU General Public License v3.0
101 stars 16 forks source link

Detect missing tags script #146

Closed r37r05p3C7 closed 3 months ago

r37r05p3C7 commented 3 months ago

Script detects missing tags by parsing the Latest Updates page using Selenium. It's not user-facing and is meant to be used like a pre-commit hook. As far as I know, this is the best way to parse all public tags at the moment.

image

Willy-JL commented 3 months ago

This could definitely come in useful, nice. Is there any particular reason why this wouldn't work with requests or aiohttp?

r37r05p3C7 commented 3 months ago

All endpoints used by the latest updates page return tags in the form of IDs. I started reading through the minified javascript, trying to find where additional fetching happens. After about 15 minutes, I gave up and decided to use Selenium.

It turns out that all static data for the latest updates page, including tags for all categories, is sent in one giant html script tag 👀

image

I'll do a rewrite tomorrow...

FaceCrap commented 3 months ago

Would this page be any help to detect missing tags?

https://f95zone.to/tags/

r37r05p3C7 commented 3 months ago

Would this page be any help to detect missing tags?

https://f95zone.to/tags/

not really, "popular tags" implies subset of all tags, which means that some tags will probably be excluded from the cloud in the future. script tag i mentioned in my previous reply contains all info i need to rewrite this script without selenium.

FaceCrap commented 3 months ago

"popular tags" implies subset of all tags,

That's what I thought too the first time I found that page, yet the missing ones you added are all listed... 🤷🏼‍♂️ I think the popular is more directed at the tag popularity, e.g. how often it's used which reflects in the font-size. That page wouldn't have any purpose if it only displayed a subset... it's actually intended to find games matching a tag.

r37r05p3C7 commented 3 months ago

That page wouldn't have any purpose if it only displayed a subset...

Adding a tag cloud always goes hand in hand with having a fully functional search feature, like the search bar with autocomplete in our case. Usually, as the number of tags grows, the cloud either gets removed or amount of displayed tags is limited to a set number.

Tags page would be the obvious pick for this task if we didn't already have the latest updates page. Right now, using the tags page just doesn't seem to make sense given its potential hidden behaviors.

r37r05p3C7 commented 3 months ago

done, no more selenium also no more error handling, it's for devs anyway

Willy-JL commented 3 months ago

Thank you, this is perfect! May I also suggest sys.exit(0) for success and sys.exit(1) for missing tags? Then it could be automated with a GitHub action (I'm not asking to make that workflow, just the exit codes on the script)

r37r05p3C7 commented 3 months ago

Access to the Latest Updates page is restricted to authenticated users. I've updated the script to accept authentication cookies from environment variables. You can provide them through action secrets. It's still possible to run script manually, cookies from database will be used as a fallback option.