bakape / shamichan

anonymous realtime imageboard focused on high performance and transparent moderation
Other
441 stars 75 forks source link

2hu (and other anime girl) captcha #662

Closed bakape closed 5 years ago

bakape commented 6 years ago

It's annoying enough to work.

bakape commented 6 years ago

Better just adapt our current captcha lib.

Chiiruno commented 6 years ago

https://github.com/dchest/captcha seems to be number-only, would we have to rewrite it?

bakape commented 6 years ago

That's the plan - cut out the parts that matter and inline in our own module.

On Fri, 7 Sep 2018, 10:00 チルノ, notifications@github.com wrote:

https://github.com/dchest/captcha seems to be number-only, would we have to rewrite it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-419342985, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsKLY_Nnb2clHP9HECoOjL5xcap7_ks5uYhmWgaJpZM4StBW1 .

bakape commented 5 years ago

To consider:

https://patents.google.com/patent/US20090328163A1/en

@Chiiruno What do you think?

Chiiruno commented 5 years ago

Seems pointless, any anti-captcha bot worth its salt will just get the captcha from a still frame and figure it out from there. Also looks like hell to implement.

Chiiruno commented 5 years ago

Unless it's letter/number by letter/number, but honestly it wouldn't be too hard for a bot to figure that out either.

bakape commented 5 years ago

What if the captcha never fully appears during any frame? Granted that is security by obscurity.

On Wed, 26 Sep 2018, 13:15 チルノ, notifications@github.com wrote:

Seems pointless, any anti-captcha bot worth its salt will just get the captcha from a still frame and figure it out from there. Also looks like hell to implement.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424662291, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsGQmDM33dxfbGIiG2Dbuu2T0kejjks5ue1PXgaJpZM4StBW1 .

Chiiruno commented 5 years ago

Then the bot would be programmed to splice two or more frames together.

bakape commented 5 years ago

What other challenge can we use in place of recognizing text characters? 2hu captcha is sounding more and more desirable with time. But it should be something comletable in 10s or less. Should probably dig around for more papers.

On 26 September 2018 at 13:19, チルノ notifications@github.com wrote:

Then the bot would be programmed to splice two or more frames together.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424663184, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsI3cI3gQWE1A1wyqBQKSWRiEEy6sks5ue1SUgaJpZM4StBW1 .

Chiiruno commented 5 years ago

Not really sure, but even as far as the 2hu captcha goes, there's still the possibility that the bot might be able to "find" danmaku and avoid them, but I'm not sure if that's possible without memory reads. I guess it's possible to get the color of the danmaku and trace them off of that, but who knows if that's time-efficient enough to work. Yeah, I was thinking 10 seconds or less too, I'll have to tweak it as I go along to find what's best.

bakape commented 5 years ago

Maybe some other simpler (and simpler to program) minigame as an alternative? What could work while still being moderately hard for bots and resistant enough against classic captcha solving services (those are probably almost all image-based)?

On 26 September 2018 at 13:36, チルノ notifications@github.com wrote:

Not really sure, but even as far as the 2hu captcha goes, there's still the possibility that the bot might be able to "find" danmaku and avoid them, but I'm not sure if that's possible without memory reads. I guess it's possible to get the color of the danmaku and trace them off of that, but who knows if that's time-efficient enough to work. Yeah, I was thinking 10 seconds or less too, I'll have to tweak it as I go along to find what's best.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424668182, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsHlphsYfPLE2Zvsga-PFIMITELiwks5ue1i_gaJpZM4StBW1 .

Chiiruno commented 5 years ago

I hate to admit it, but contextual captcha is a good option, but I don't want google scripts anywhere near meguca if possible. Of course, implementing ourselves would be pointless, since the whole point of google captcha is that it's more or less random each time, so a bot can't build a list of passable images.

Noscript google captcha would be an option, but I wouldn't want any part of it, the whole thing with google captcha leaves a sour taste in my mouth.

It really depends on the threat model, and as far as I can tell, 2hu captcha should be enough, I don't think someone is going to build a fucking memory reading hack for an image board.

Chiiruno commented 5 years ago

Well, the best way to fool a computer is context, as well as human feeling and pattern recognition. I feel that goes beyond the scope of meguca though. Do you want to pioneer some ridiculous captcha that can't be solved until an advanced enough AI develops that can tell these human-specific contexts?

bakape commented 5 years ago

Ideally I would want to build a reusable library and let it loose.

bakape commented 5 years ago

I want to find a paper with a good idea and implement it to some extent.

Chiiruno commented 5 years ago

Of course, 2hu captcha will be a Rust and Rust/WASM library for everyone. I wouldn't be surprised if fucking 4chan implemented it if it got popular enough.

I want to find a paper with a good idea and implement it to some extent.

Try asking on /g/, or even 8/tech/, you'll be able to multiply your search net.

bakape commented 5 years ago

Or alternatively do something really dumb like unicode emoji captcha. Just need a decent rendering and image transformation filter then.

Chiiruno commented 5 years ago

Bots will probably be able to detect emoji with ease, since there's a lot of repeated patterns.

bakape commented 5 years ago

Okay, I'll mull this over some more today. Need to go now.

Chiiruno commented 5 years ago

Itterashai~

Chiiruno commented 5 years ago

Ties into #778

bakape commented 5 years ago

But noscript google captcha in a contained iframe is a pretty decent alternative.

On 26 September 2018 at 13:40, meguca meguca scorpid33@gmail.com wrote:

Maybe some other simpler (and simpler to program) minigame as an alternative? What could work while still being moderately hard for bots and resistant enough against classic captcha solving services (those are probably almost all image-based)?

On 26 September 2018 at 13:36, チルノ notifications@github.com wrote:

Not really sure, but even as far as the 2hu captcha goes, there's still the possibility that the bot might be able to "find" danmaku and avoid them, but I'm not sure if that's possible without memory reads. I guess it's possible to get the color of the danmaku and trace them off of that, but who knows if that's time-efficient enough to work. Yeah, I was thinking 10 seconds or less too, I'll have to tweak it as I go along to find what's best.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424668182, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsHlphsYfPLE2Zvsga-PFIMITELiwks5ue1i_gaJpZM4StBW1 .

Chiiruno commented 5 years ago

Like I said, I can't deny it on it's technical merit. I just don't want to be part of it, I don't really want that cancer to infest meguca. Just make sure it's restricted as much as possible if you do it, please. Try asking meguca about this during busier hours, or maybe I will if I'm awake, some gucas might be able to give some good ideas.

Chiiruno commented 5 years ago

Oh, there is one problem with noscript (google) captcha, if you use umatrix, you have to turn off 'spoof referrer header', or else the captcha will endlessly cycle or error.

bakape commented 5 years ago

Wait, does cropping an image prevent it from being detected on saucenow? Let's check.

bakape commented 5 years ago

No, cropping won't work. What is non-lossy (in terms of depicted objects) way to render an image undetectable by iqdb and saucenao?

bakape commented 5 years ago

Let's say you have this set of images: https://gelbooru.com/index.php?page=post&s=list&tags=1girl+cirno+rating%3asafe How do you make them undetectable though IQDB, saucenao and other image search engines while still retaining enough information to identify Cirno? We could do cropping with account to some false negatives, maybe.

bakape commented 5 years ago

https://github.com/hybridgroup/gocv https://github.com/nagadomi/lbpcascade_animeface

I'm fucking doing this.

bakape commented 5 years ago

Ideas:

meguca integration:

bakape commented 5 years ago

Work will proceed in https://github.com/bakape/anicha. This issue will refer to integrating anicha with meguca.

Chiiruno commented 5 years ago

be gone for 3 hours work on computer internals a fucking project

Well, okay. At least I can work on the (my) 2hu captcha at a more leisurely pace now. This seems a bit more sane anyway, and a hell of a lot easier to implement.

Chiiruno commented 5 years ago

How do you make them undetectable though IQDB, saucenao and other image search engines while still retaining enough information to identify Cirno? We could do cropping with account to some false negatives, maybe.

Changing the color slightly probably. Unless their neural net can detect that too.

Chiiruno commented 5 years ago

3 of the blocks should be the correct answer, so 6 other blocks are random 2hus, but 3 are say, Cirno. These blocks should be randomized in order. Admin should set, since spam prevention is a site-wide issue, and it's possible to find characters that there are very few of, so some BO could set it to someone like that and bypass a lot of the captcha by less possibilities.

Chiiruno commented 5 years ago

Also if there's only like 3 images for a character, it would be possible for all of them to be it, and that would confuse the user.

bakape commented 5 years ago

be gone for 3 hours work on computer internals a fucking project

Such is life.

Chiiruno commented 5 years ago

Should probably have an example (different than correct images) image of the correct character in the top like in google captcha, since it's unreasonable to expect everyone to know every possible anime character. Not so much a problem with 2hus, but honestly I'd do it with that too.

bakape commented 5 years ago

That opens up one more botting vector though - NN that detects the same character.

On 26 September 2018 at 21:56, チルノ notifications@github.com wrote:

Should probably have an example (different than correct images) image of the correct character in the top like in google captcha, since it's unreasonable to expect everyone to know every possible anime character. Not so much a problem with 2hus, but honestly I'd do it with that too.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424830929, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsIsVbsIliSe3yyN1kXNfZG7jmMlBks5ue83mgaJpZM4StBW1 .

Chiiruno commented 5 years ago

Maybe there's a better way, perhaps information on what kind of sex or hair color or long/short hair or type of dress. There has to be some sort of information available. It might be possible to reuse tags to some extent.

bakape commented 5 years ago

I was thinking of the admin just picking a set of characters that anyone worthy of visiting the site should recognize. For example, everyone knows Cirno, Patchouli and Reimu. You can add megucas and jelly lions to that lits. For megu/pol/ it can be just Homura. That way the captcha also serves as a soft reallyfuckingnewfag filter.

On 26 September 2018 at 22:01, チルノ notifications@github.com wrote:

Maybe there's a better way, perhaps information on what kind of sex or hair color or long/short hair or type of dress. There has to be some sort of information available. It might be possible to reuse tags to some extent.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424832082, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsHnqereXAPofKrFKV9UIehPu_OPtks5ue870gaJpZM4StBW1 .

Chiiruno commented 5 years ago

Okay, that works. The user has a way of knowing which character to choose, right? Probably a name in the top?

Chiiruno commented 5 years ago

We could possibly do this without even naming the character by having 3 images containing the same character as defined by the server being the correct answer, with all the others being other characters, and possibly more than one of the same possible wrong characters.

Chiiruno commented 5 years ago

Another option to reduce possibilities of a bot matching, would be to have the 9x9 grid have all different characters, with only one of them being the correct one.

Of course, this would require a name at the top or something above.

bakape commented 5 years ago

Oh, that's a good point.

On 26 September 2018 at 22:29, チルノ notifications@github.com wrote:

Another option to reduce possibilities of a bot matching, would be to have the 9x9 grid have all different characters, with only one of them being the correct one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424840918, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsNIWKTF9FmELw5BTLa5-Us3WZJ5Gks5ue9WegaJpZM4StBW1 .

Chiiruno commented 5 years ago

I edited the comment, we'll still need some sort of indicator of which character to select, probably name. On Wednesday, September 26, 2018 2:56:56 PM CDT bakape wrote:

Oh, that's a good point.

On 26 September 2018 at 22:29, チルノ notifications@github.com wrote:

Another option to reduce possibilities of a bot matching, would be to have the 9x9 grid have all different characters, with only one of them being the correct one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bakape/meguca/issues/662#issuecomment-424840918, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsNIWKTF9FmELw5BTLa5 -Us3WZJ5Gks5ue9WegaJpZM4StBW1 .

bakape commented 5 years ago

name at the top or something above.

That was always the intention. Given a name you have to find the girl in a wall of noise.

Chiiruno commented 5 years ago

This might be helpful in some way. https://github.com/antonpaquin/IsItAnime

bakape commented 5 years ago

Depends on #663