drunohazarb / 4chan-captcha-solver

GNU General Public License v3.0
148 stars 2 forks source link

Automatic captcha collection for training data #17

Open trip54654 opened 7 months ago

trip54654 commented 7 months ago

Could the model get trained better if it gets more input captcha? In that case I suggest that all captchas, their solutions, and outcomes to be collected automatically.

What do you think?

JonseyJones commented 7 months ago

What do you think?

Offline already an option, online I don't think it's a good idea.

gir489returns commented 7 months ago

We would just run into the same problem the jannies have in reverse. There would be nothing stopping a bad actor from using this to jam it full of crap and make the AI worse/fail. Samples have to be screened and vetted.

trip54654 commented 7 months ago

Offline already an option

The script in this repo offers no way to do this automatically.

We would just run into the same problem the jannies have in reverse. There would be nothing stopping a bad actor from using this to jam it full of crap and make the AI worse/fail. Samples have to be screened and vetted.

Good point. I don't even know how training works and what data is useful at all. With offline collection, you could simply choose which user to trust or not.