QuiltMC / quiltmc.org

The source for quiltmc.org
https://quiltmc.org
Other
58 stars 44 forks source link

How to determine latest mappings version as a bot? #229

Closed mk-pmb closed 1 month ago

mk-pmb commented 1 month ago

Hi! How can I determine the latest version? In the Readme (of the quilt-mappings repo, where I originally posted this issue) I found the hint

You can see additional information and see what the latest QM build for each Minecraft version is with LambdAurora's import tool.

… which redirected me to https://quiltmc.org/en/usage/latest-versions/ , but the site uses CloudFlare protection in very paranoid mode that doesn't even allow me read access. (According to CloudFlare, due to my privacy preferences, I'm not "human" enough.) So, is there a way for bots like me to automatically determine the latest version?

Edit: I managed to enlist a CloudFlare-certified human for help but they said the page was broken anyway.

FirstMegaGame4 commented 1 month ago

I'd say use https://meta.quiltmc.org/#/v3/get_v3_versions_quilt_mappings?

UpcraftLP commented 1 month ago

Edit: I managed to enlist a CloudFlare-certified human for help but they said the page was broken anyway.

Seems fine to me, what is "broken" about it?
image

mk-pmb commented 1 month ago

@FirstMegaGame4 That works. Thanks! I'll file a docs PR to have the Readme mention this option.

@UpcraftLP For me, the website has no useful content. It only shows "Enable JavaScript and cookies to continue" – which is forbidden by the privacy configuration for this task. Old lab notes say that it would be useless anyway because I would fail their captcha. Of course it would be nice if you could find a way to allow anyone read access even if Cloudflare considers them suspicious, but the above API is good enough for me.

Edit: I also found the query for just a specific Minecraft version. That seems even more fitting. :+1:

Edit 2: Actually, since I need only the build number, querying GitHub for the tag names was even easier.

Akarys42 commented 1 month ago

The raw data of https://quiltmc.org/en/usage/latest-versions/ is also available at https://quiltmc.org/api

mk-pmb commented 1 month ago

Thanks for adding this link, it may be informative for others. Currently, however, it suffers from the same Cloudflare paranoia as the original link, for me showing only "Enable JavaScript and cookies to continue", i.e. the standard Cloudflare CAPTCHA page.

Akarys42 commented 1 month ago

That sounds likely. The infra team has to make a special rule to relax the cloudflare restriction on the API endpoints. I guess they weren't tested outside of an actual broswer. That being said, I believe it is called by some launchers or so, to get the latest installer.

Pyrofab commented 1 month ago

Just checked, there is already a rule in place to set the Cloudflare security to the "essentially off" setting. My guess is that the VPN/TOR exit node you are going through has such a horrendous web reputation that even Cloudflare's lowest setting doesn't let you through without a JS check. I don't think we can reasonably lower it further on our end, so you may have to either change your VPN configuration or enable Javascript for this website.

mk-pmb commented 4 weeks ago

I don't think we can reasonably lower it further on our end,

What risk do you fear from mere read access? Rate limits could still apply.

mk-pmb commented 4 weeks ago

The raw data of https://quiltmc.org/en/usage/latest-versions/ is also available at https://quiltmc.org/api

I just had a new idea: To use an IP with regular reputation (like the GitHub CI cluster) to make a public (unrestricted read access) read-only mirror of the API. That way I can shoulder the risk of anonymous readers myself. However, when I tried to circumvent the CAPTCHA by reading the WayBack Machine's memento of the 2nd URL, it turned out to be another HTML page that seems to need JavaScript to display any useful content.

Which made me pause and realize: Mirroring the API output isn't even efficient if all data sources used internally by the API are public. I found these data sources that may be part of the protected API:

Is there anything in the API that's not already public on unrestricted servers?

Akarys42 commented 4 weeks ago

That seems overly complex for nothing. Can't someone with Cloudflare access check why the access is being rejected? I also made a worker to post firewall alerts periodically to a channel, but I don't think it has been restored sadly (my poor baby).

mk-pmb commented 3 weeks ago

Anyways, even if the current paranoia persists, please at least fix the message. "Our estimate of your behavior reputation is too low." would at least make sense. "You must be a human to interact with our API" does not, or makes it seem like you don't know what an API is for.

Akarys42 commented 3 weeks ago

Ideally the team needs to find out why you're getting blocked. Are you in China by any chance, or using a VPN or Tor? They have a big IP reputation issue. This firewall shouldn't be hit by anyone who isn't an active nuisance on the infrastructure. As for modifying the message, it requires a very expensive Cloudflare subscription to do so. Can you please screenshot what you see when manually navigating to the endpoint in your browser and/or curl?

mk-pmb commented 3 weeks ago

As for modifying the message, it requires a very expensive Cloudflare subscription to do so.

Cool, I learned something today. It fits the overall picture.

Ideally the team needs to find out why you're getting blocked.

Because I'm not "human" enough, obviously. (Hammering this point because as a privacy activist I hate the lie that user agent reputation is about humanness.) Yes I was using TOR, as Pyrofab already guessed. It's a known problem with Cloudflare that they are quick to insult people as subhuman because they use a lazy-hostile IP repuration mechanism and prefer using bad science for victim blaming rather than fixing their software. All other major CDNs don't have that problem. It fits with their aggressive marketing approach that also is not primarily optimized for factual correctness.

For an API, that humanness-lie is even more ridiculous obviously. Of course for a project like this, there's probably just not enough money to pay a serious CDN, but then people should at least know the implications of their choices. If the issue were still a problem, I'd even try and help you find a solution that works without CloudFlare, e.g. maybe finding ways to host on GitHub pages instead. Which would of course depend on what the actual fear scenario is (question above still unanswered or I missed it), and whether the API provides extra data over what is available from unrestricted sources (question above still unanswered or I missed it). My personal use case currently seems to work sufficiently without the API, so I don't see a need for technical change, just for more awareness.

Akarys42 commented 3 weeks ago

Back when I was in charge, I took the decision to officially drop support for Tor communication, due to the risks provided with bypassing any protection coming from such a network, sorry. It'll be up to the current team to decide to uphold this or not.

The decision to move to Cloudflare was a long and involved process, and I do not see a universe where switching back is a good idea. A large part of the infrastructure even uses Cloudflare exclusive features to work, which are provided entirely for free, something no other CDN will do.

Pyrofab commented 3 weeks ago

As an occasional Tor user myself, I agree that this state of affairs is rather unfortunate. And depending on Cloudflare's mood, I guess the JS challenge may not even work... However, as the other person said, Cloudflare is convenient for us, and unfortunately the simplest ways to mitigate your issues (disabling the security/changing the message) are paywalled. If someone has a good idea that is easy and cheap to implement I can try it, but my free time on this project is limited so I can't promise anything.