Closed Alighieri99G closed 1 month ago
Yes it indeed dosnt work Anyone know any work arounds?
Sorry, this is way outside of my wheelhouse
@K1rakishou https://git.coom.tech/araragi/JKCS
Yeah, I don't know. I tried to use the new model but I can't make it work (I probably fucked up somewhere and I have no idea where).
If you have any idea then you can take a look at this function - https://github.com/K1rakishou/4chanCaptchaSolver/blob/master/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Solver.kt#L35
In the script they do some weird voodoo shit and there are no equivalent functions in Android TFLite.
const filtered2 = tf.tensor3d(mono, [image.height, image.width, 1]);
const prediction = model.predict(filtered2.transpose([1, 0, 2]).expandDims(0));
And also this
const greedyCTCDecode = (yPred: tf.Tensor<tf.Rank>) => tf.tidy(() => yPred.argMax(-1).arraySync());
I tried to do those conversions manually with ChatGPT's help but the results are clearly wrong.
(It predicts YY
for the currently hardcoded JXAPXW
captcha).
Other than that I have updated the image sliding algorithm and it works. The only thing that blocks me right now are those two conversion functions (I think).
Maybe the code is correct but I have fucked up when converting the model from h5 format into tflite format (again, did that with the help of ChatGPT).
Saw this on /g/, it might be helpful: https://boards.4chan.org/g/thread/95322117#p95390905
In the script they do some weird voodoo shit That's a standard HWC to WHC conversion of the input tensor, then it adds batch dimension so the final tensor is BWHC
And also this Thats output lebel decoder, because of CTC loss used during the model training - you need it. There's reference implementation in java: https://www.tensorflow.org/jvm/api_docs/java/org/tensorflow/op/nn/CtcGreedyDecoder
You can also use this method to embed CTC decoder in your tfile: https://stackoverflow.com/questions/74762668
I have already tried all of the CTC decoder implementations that I could find and none of them helped.
voodoo shit
Maybe instead of asking ChatGPT to do the work for you, you could have just asked it how it works.
Reshaping isn't a transpose, the input image is supposed to be transposed. Have you tried visualizing what the result looks like? It's probably scrambled. A proper transpose would look like it was rotated 90 degrees then mirrored (vertically if the rotation was CCW, else horizontally).
For this input
A transpose would look like this
But reshaping makes it look like this
(ignore the labels, it's just chatgpt being retarded)
@coomdev Ohh, so it had to be rotated and mirrored just like in the previous version. I see. Yeah, this was not obvious to me at all even after reading your code, jupyter notebook and chatgpt's explanations. I just looked at the input of the model in netron and it said that it's 1x300x80 so to me it was obvious that I don't need to rotate it in any way. It works now, thanks!
@K1rakishou Here, if the input image isn't 300px wide, it shouldn't be drawn in the center of the canvas but stretched to take the available space. (the model was trained that way)
@coomdev I don't get it. If I draw foreground image stretched to 300px then it won't be aligned with the background image anymore. Do you mean that I need to stretch the resulting image after combining bg + fg? If yes, then I'm already doing it here: https://github.com/K1rakishou/4chanCaptchaSolver/blob/0240aefe9a60fd4b86644a28389168f3b5252bbd/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Helpers.kt#L162
The problem right now is that sometimes after combining both images there are some big groups of black pixels left on the sides which are processed by the model and it sometimes sees characters in them. So from my understanding I need to somehow remove them. Here is an example:
Nevermind I figured it out. This https://github.com/K1rakishou/4chanCaptchaSolver/blob/0240aefe9a60fd4b86644a28389168f3b5252bbd/app/src/main/java/com/github/k1rakishou/chan4captchasolver/Helpers.kt#L133 canvas was using an incorrect width (300 instead of width of the smallest of the images). That's why there were garbage pixels drawn on it. Now it works.
If yes, then I'm already doing it here
Ah, nevermind then, I misinterpreted the code.
The way we preprocessed the images was to make them display exactly as show on 4chan's captcha: the foreground image is the canvas, the background image is not visible behind, and that is then stretched to 300.
Alright, hopefully this is the last set of bugfixes. Thanks for your help.
Seems like there have been some changes to the way that the captcha displays, now the solver doesn't work properly