Closed JonseyJones closed 11 months ago
you vil align ze circle and you vil like it
well if someone is able to manually compile a dataset of captchas+solutions (rip) they could retrain the nn
how many examples would be required to retrain the model? Current model even gets the circle correct some of the time
16k images of new captcha, it was able to converge since since 4k+, surprisingly easy https://captcha.chance.surf/bundle_16kblack/images.zip https://captcha.chance.surf/bundle_16kblack/model.h5
@moffatman Thanks a bunch for training the model!
7826d5ae53da72a6cf64dd2a82a4a3f5aec557b9
it was able to converge since since 4k+, surprisingly easy
Was it really? Accuracy for old captchas (still in use) is ~78% but feels even lower - like 3-4 of 10 captchas being solved correctly. Fine-tuned @moffatman 16k model on combined dataset - 10k old + 16k new captchas from @moffatman and 3.5k old captchas from @coomdev. Achieves 98.7% (7388/7485) accuracy. model notebook
Also I'd recommend to use latest trained model to sanitise your own dataset - both old 10k and new 16k (and coomtech's one) contain some misaligned and/or mislabeled captchas.
@Yukariin Ah, a hybrid model, really interesting! I didn't realize the simpler captcha still gets served sometimes. I have a lot more data, probably a million+ old captchas, and growing number of new captchas (50k atm). So I will play around with it. Thanks for all your contribution here!!
I did notice some misaligned, since I get most of these from kuroba users, and it doesn't use optimal alignment method.
It has a big black circle with a random position.