K1rakishou / 4chanCaptchaSolver

81 stars 4 forks source link

No longer functioning after changes to captcha formatting #4

Closed Alighieri99G closed 2 days ago

Alighieri99G commented 1 year ago

Seems like there have been some changes to the way that the captcha displays, now the solver doesn't work properly

KurubaEX commented 1 year ago

Yes it indeed dosnt work Anyone know any work arounds?

Alighieri99G commented 1 year ago

Sorry, this is way outside of my wheelhouse

aicynide commented 1 year ago

@K1rakishou https://git.coom.tech/araragi/JKCS

K1rakishou commented 1 year ago

Yeah, I don't know. I tried to use the new model but I can't make it work (I probably fucked up somewhere and I have no idea where).

If you have any idea then you can take a look at this function - https://github.com/K1rakishou/4chanCaptchaSolver/blob/master/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Solver.kt#L35

In the script they do some weird voodoo shit and there are no equivalent functions in Android TFLite.

  const filtered2 = tf.tensor3d(mono, [image.height, image.width, 1]);
  const prediction = model.predict(filtered2.transpose([1, 0, 2]).expandDims(0));

And also this

const greedyCTCDecode = (yPred: tf.Tensor<tf.Rank>) => tf.tidy(() => yPred.argMax(-1).arraySync());

I tried to do those conversions manually with ChatGPT's help but the results are clearly wrong. (It predicts YY for the currently hardcoded JXAPXW captcha).

Other than that I have updated the image sliding algorithm and it works. The only thing that blocks me right now are those two conversion functions (I think).

Maybe the code is correct but I have fucked up when converting the model from h5 format into tflite format (again, did that with the help of ChatGPT).

bronkeye commented 1 year ago

Saw this on /g/, it might be helpful: https://boards.4chan.org/g/thread/95322117#p95390905

aicynide commented 1 year ago

In the script they do some weird voodoo shit That's a standard HWC to WHC conversion of the input tensor, then it adds batch dimension so the final tensor is BWHC

And also this Thats output lebel decoder, because of CTC loss used during the model training - you need it. There's reference implementation in java: https://www.tensorflow.org/jvm/api_docs/java/org/tensorflow/op/nn/CtcGreedyDecoder

aicynide commented 1 year ago

You can also use this method to embed CTC decoder in your tfile: https://stackoverflow.com/questions/74762668

K1rakishou commented 1 year ago

I have already tried all of the CTC decoder implementations that I could find and none of them helped.

coomdev commented 1 year ago

voodoo shit

Maybe instead of asking ChatGPT to do the work for you, you could have just asked it how it works.

https://github.com/K1rakishou/4chanCaptchaSolver/blob/master/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Solver.kt#L114

Reshaping isn't a transpose, the input image is supposed to be transposed. Have you tried visualizing what the result looks like? It's probably scrambled. A proper transpose would look like it was rotated 90 degrees then mirrored (vertically if the rotation was CCW, else horizontally).

For this input HSXVW

A transpose would look like this image

But reshaping makes it look like this

image

(ignore the labels, it's just chatgpt being retarded)

K1rakishou commented 1 year ago

@coomdev Ohh, so it had to be rotated and mirrored just like in the previous version. I see. Yeah, this was not obvious to me at all even after reading your code, jupyter notebook and chatgpt's explanations. I just looked at the input of the model in netron and it said that it's 1x300x80 so to me it was obvious that I don't need to rotate it in any way. It works now, thanks!

coomdev commented 1 year ago

https://github.com/K1rakishou/4chanCaptchaSolver/blob/0240aefe9a60fd4b86644a28389168f3b5252bbd/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Helpers.kt#L210

@K1rakishou Here, if the input image isn't 300px wide, it shouldn't be drawn in the center of the canvas but stretched to take the available space. (the model was trained that way)

K1rakishou commented 1 year ago

@coomdev I don't get it. If I draw foreground image stretched to 300px then it won't be aligned with the background image anymore. Do you mean that I need to stretch the resulting image after combining bg + fg? If yes, then I'm already doing it here: https://github.com/K1rakishou/4chanCaptchaSolver/blob/0240aefe9a60fd4b86644a28389168f3b5252bbd/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Helpers.kt#L162

The problem right now is that sometimes after combining both images there are some big groups of black pixels left on the sides which are processed by the model and it sometimes sees characters in them. So from my understanding I need to somehow remove them. Here is an example: image

K1rakishou commented 1 year ago

Nevermind I figured it out. This https://github.com/K1rakishou/4chanCaptchaSolver/blob/0240aefe9a60fd4b86644a28389168f3b5252bbd/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Helpers.kt#L133 canvas was using an incorrect width (300 instead of width of the smallest of the images). That's why there were garbage pixels drawn on it. Now it works.

coomdev commented 1 year ago

If yes, then I'm already doing it here

Ah, nevermind then, I misinterpreted the code.

The way we preprocessed the images was to make them display exactly as show on 4chan's captcha: the foreground image is the canvas, the background image is not visible behind, and that is then stretched to 300.

K1rakishou commented 1 year ago

Alright, hopefully this is the last set of bugfixes. Thanks for your help.