TASVideos / tasvideos

The code for the live TASVideos website
https://tasvideos.org/
GNU General Public License v3.0
62 stars 29 forks source link

Find alternative to reCAPTCHA (Google's spam prevention) #1647

Open YoshiRulz opened 1 year ago

YoshiRulz commented 1 year ago

reCAPTCHA is easily defeated for cheap (and, seeing as the client makes a request to a Google-owned domain, isn't exactly the best idea for user protection).

Some alternatives:

calvarado194 commented 1 year ago

hCapctha seems tobe very popular for solvers on GitHub itself, from a cursory search.

Masterjun3 commented 6 months ago

I don't think we currently need an alternative. All of the linked articles and problems are about reCAPTCHA v2, which needs interaction or challenges to prove you're a human. We use reCAPTCHA v3, which runs without any user interaction.

The "buster" solver is a browser extension. The kind of bots we are worried about don't use a regular browser with extensions installed, because then a normal human could just do it. It's about scaling up. I have yet to see a proof of concept that this captcha is really as bad as argumented here.

I will close this issue if no further information or arguments are provided.

YoshiRulz commented 6 months ago

We use reCAPTCHA v3, which runs without any user interaction.

It usually requires no interaction from humans, but if your score is too low, that's when you get a challenge (supposedly; I didn't see it in the official docs).

The "buster" solver is a browser extension. The kind of bots we are worried about don't use a regular browser with extensions installed [...] I have yet to see a proof of concept that this captcha is really as bad as argumented here.

Not sure what this is based on. You can easily run "headless" FF or Chrome instances, or cannibalise the extension and run it in Node or something. And a quick web search suggests that yes, v3 can also be broken.

All of this is missing the point: reCAPTCHA is a Google product, and Google is not your friend. If you wanted to use reCAPTCHA v3 properly, clients actually need to load and run JS from Google on most pages. That would be a very bad idea.

Masterjun3 commented 6 months ago

(supposedly; I didn't see it in the official docs)

So where does that information come from?

You can easily run "headless" FF or Chrome instances, or cannibalise the extension and run it in Node or something.

"full automation and scripting are not within the scope of this project due to their potential for misuse. The solver must always be manually started from the extension button."

All of this is missing the point: reCAPTCHA is a Google product, and Google is not your friend. If you wanted to use reCAPTCHA v3 properly, clients actually need to load and run JS from Google on most pages. That would be a very bad idea.

But we don't use it on most pages. We also use YouTube as the place to watch the very content our site is designed around. We also allow embedding YouTube directly in our submissions or forum posts. Many other very popular websites also use reCAPTCHA. I don't understand what this issue is for. If you want complete isolation from Google that's fine. But then you also have to accept it if there is a a decision against Google isolation. Marking this reCAPTCHA as a "security" issue of our site is backwards. It'd be a security issue if we didn't have a CAPTCHA.

And looking realistically, what we have currently is the best you're going to get, considering the site circumstances. It's a solid working solution requiring no further maintenance and is future proof.

YoshiRulz commented 6 months ago

Sorry, but I have to stand up for this one, and that involves tearing apart your argument.

[...] if your score is too low, that's when you get a challenge (supposedly; I didn't see it in the official docs).

So where does that information come from?

I think it's in one of the articles I linked, but that's also how v2's invisible mode works. The alternative would be you're just locked out of the form, which would be dumb. The official docs do say "reCAPTCHA v3 will never interrupt your users," so maybe it is that after all.

[re: Buster]

This was always just an example. If you insist on seeing a bypass for v3 that runs outside the browser and is self-hosted, I found this Python library and confirmed the registration page has the kind of reCAPTCHA it requires.

If you wanted to use reCAPTCHA v3 properly, clients actually need to load and run JS from Google on most pages.

But we don't use it on most pages.

That's right, the site doesn't implement reCAPTCHA properly at the moment. It needs to track users by design—that's nominally how it knows what fake traffic looks like. This is why I suggested alternatives based on proof-of-work.

We also use YouTube as the place to watch the very content our site is designed around.

By proportion of time spent "on" the site, watching encodes may be the biggest, but encodes are hosted on the Internet Archive as well as YouTube. And emulator playback is also an option.

We also allow embedding YouTube directly in our submissions or forum posts.

Trivially blocked client-side.

Many other very popular websites also use reCAPTCHA.

Not relevant.

I don't understand what this issue is for. If you want complete isolation from Google that's fine. But then you also have to accept it if there is a a decision against Google isolation.

You should reasonably expect that choosing not to patronise Google would lock you out of their services (and third-party services which are deeply integrated with e.g. Drive). You shouldn't expect it to lock you out of half of the Web. To my knowledge, TASVideos is only integrated with YouTube and reCAPTCHA, and as I've just explained the site can be used without connecting to YouTube.

Marking this reCAPTCHA as a "security" issue of our site is backwards. It'd be a security issue if we didn't have a CAPTCHA.

You should know this issue predated the new label—but I'd also argue that forcing clients to run opaque third-party code is a security issue, as well as a privacy one. If you removed reCAPTCHA without a replacement, you'd potentially have an abuse issue, not a security issue. That might have been lost in translation.

And looking realistically, what we have currently is the best you're going to get, considering the site circumstances. It's a solid working solution requiring no further maintenance and is future proof.

The solution that's not implemented properly is the best you're going to get? Assuming the site never reaches the gratis tier's 1M/mth cap, it's still only as future-proof as the SaaS provider allows. It's Google, so maybe them going out of business isn't a worry, but they do have a habit of killing products, even ones that are popular and profitable. Nothing on the Web is really maintenance-free anyway.

Masterjun3 commented 6 months ago

Okay that post is kind of too long for me to understand. Sure, consider my replies "torn apart". I'll let adelikat handle this one.

vadosnaprimer commented 6 months ago

the site can be used without connecting to YouTube

It can be, but it's exceptionally rare.

CasualPokePlayer commented 6 months ago

The alternative would be you're just locked out of the form, which would be dumb

This is the actual reality for some users who are having trouble even registering onto the site. The implementation done in here at least will just make it so the user cannot register if their score is too low (and there's typically no way to fix that other than trying a different internet connection / maybe some VPN)

At the very least we should not be using some purely invisible service, if it means a human in any possible case has no way to register onto the site.

YoshiRulz commented 5 months ago

the site can be used without connecting to YouTube

It can be, but it's exceptionally rare.

3 of 53 responses or ~5% said they don't use YouTube, which I'd say is a not insignificant proportion, but the sample isn't nearly large enough to say anything. Videos on the TASVideos channel regularly get a couple thousand views.


With #1790 merged, the site is now firmly on v2 of reCAPTCHA, so our earlier bikeshedding over how feasibly v3 can be attacked is no longer relevant. What benefit the CAPTCHA provides remains questionable.

adelikat commented 5 months ago

Over the years we have had some pretty massive bot account spam. Years back, on occasion, our captcha would get cracked and we would get 2-3 new accounts a day that were spam. Then we update the captcha and it goes away.

To this day we get the occasional spammer. We know we have a problem here, and I believe it would get worse without a captcha

Spikestuff commented 5 months ago

I believe it would get worse without a captcha

And considering some are now being slick about it; #1743, #1762 and #1776 were made because of it.

YoshiRulz commented 5 months ago

No-one ever suggested there should be no CAPTCHA on the site.