drunohazarb / 4chan-captcha-solver

GNU General Public License v3.0
173 stars 3 forks source link

new captcha #30

Open slabodan opened 1 month ago

slabodan commented 1 month ago
Screenshot 2024-09-18 at 14 56 29 Screenshot 2024-09-18 at 14 56 17

chopped off characters in bg

slabodan commented 1 month ago

100.zip 100 samples

trip54654 commented 1 month ago

but the server says I posted anyway, but the post never shows.

This is old. It happens when you solve the captcha incorrectly, sometimes even when you don't enter anything in the captcha field at all. Probably a black hole for bots. It's pretty annoying, I even wrote a userscript that saves my post so I can just post it again without having to write it again.

Sometimes, when they take advanced measures, every post you try to make in a thread gets this treatment. (Probably useragent based.)

trip54654 commented 1 month ago

By the way, still getting my old 6 character captchas on my home ISP.

slabodan commented 1 month ago

300.zip

Dankan37 commented 1 month ago

Same. It only gives me the strict CAPTCHAs on my proxies, of which even if I manually solve them, it still either bans me for being a proxy, or tells me I'm rangebanned.

tbh I think they just expanded the rangebanned ips to sell more passes, I also have this issue with my IPs but oddly only when posting on /g/, guess everyone has their share of fun over there

JonseyJones commented 1 month ago

Can confirm, there is a new captcha. No problem posting with the new captcha on any board.

trip54654 commented 1 month ago

No problem posting with the new captcha on any board.

The solver has huge problems with most of them. But after a while it switches back to the old one. (Experienced with old 4chan cookies and routine dynamic IP change.)

JonseyJones commented 1 month ago

No problem posting with the new captcha on any board.

The solver has huge problems with most of them. But after a while it switches back to the old one. (Experienced with old 4chan cookies and routine dynamic IP change.)

I get a few old captchas but no switch.

smitlvh commented 1 month ago

can confirm here as well. Especially on high activity shitpost thread like /v/

slabodan commented 1 month ago

500.zip

@Yukariin I hope you're still around. Can we do initial fine tune with 500?

Yukariin commented 1 month ago

@slabodan try this

slabodan commented 1 month ago

Yeah so much better, got 6/8 correct, lets not close this issue for now, I will test tomorrow and see if I need to get a few hundred more but with this accuracy should be easy. Thanks.

Though I suspect that in the next few days or weeks they are gonna go live with this one https://github.com/drunohazarb/4chan-captcha-solver/issues/28

Yukariin commented 1 month ago

@gir489returns check if you're actually using the new script. For the amount of new data it has been trained on, it is doing surprisingly well for me so far.

Screenshot 2024-09-20 at 19 20 13
Dankan37 commented 1 month ago

Same, I have the newest version and it still behaves like the old one getting spoofed, I guess it needs more time in the oven image

JonseyJones commented 1 month ago

My humble harvest of 100 samples. @Yukariin @drunohazarb 100 samples.zip

moffatman commented 1 month ago

bundle-1726856583.zip

here's 5k+

JonseyJones commented 1 month ago

bundle-1726856583.zip

here's 5k+

Thanks a lot! @Yukariin should check it out!

BenderBRod commented 1 month ago

@slabodan try this

This works absolutely perfect for me already After like 50 posts only 1 failed

Dankan37 commented 1 month ago

@Yukariin I have a question, though maybe this place is not the most suited but at least I avoid making more posts, I was wondering how the weights64 in the main code are obtained? Do you just code64 the bin files for the weight or do you go through something else? I am kind of missing this step, cheers. Anyway I technically have a model with 100% success, trained on the 6k captchas sent soo far, the only issue is that it's like 45MB

slabodan commented 1 month ago

idk works great for me with only 500 captchas

Yukariin commented 1 month ago

@Dankan37 depends on your model architecture. For the current model you want to get the prediction variant, without input labels and CTC layer. The resulting file for the current model is 2.8 MB. Next, you want to convert it to tfjs format, using tfjs_converter. Something like tensorflowjs_converter --input_format=keras model_pred_new_captcha.keras tfjs_model. The resulting tfjs_model is a directory that contains model json and binary file (of the same size). You put the json inmodelJSON. For the binary file, you want to encode it via base64, something like this base64 -i group1-shard1of1.bin -o model.txt. And finally, you put the content of the resulting txt into weights64

From your mention of model size and accuracy, I can assume that your model is larger than the current one and may be slower. Also, 100% accuracy can indicate that the model is overfitting, so I'd test it on a different set of data that hasn't been seen during training.

JonseyJones commented 1 month ago

@Dankan37 depends on your model architecture. For the current model you want to get the prediction variant, without input labels and CTC layer. The resulting file for the current model is 2.8 MB. Next, you want to convert it to tfjs format, using tfjs_converter. Something like tensorflowjs_converter --input_format=keras model_pred_new_captcha.keras tfjs_model. The resulting tfjs_model is a directory that contains model json and binary file (of the same size). You put the json inmodelJSON. For the binary file, you want to encode it via base64, something like this base64 -i group1-shard1of1.bin -o model.txt. And finally, you put the content of the resulting txt into weights64

From your mention of model size and accuracy, I can assume that your model is larger than the current one and may be slower. Also, 100% accuracy can indicate that the model is overfitting, so I'd test it on a different set of data that hasn't been seen during training.

Are you gonna update the script with the @moffatman samples?

Yukariin commented 1 month ago

@JonseyJones yeah, but that bundle still contains some old captchas, so I need to fix my classifier first...

Yukariin commented 1 month ago

I was just about to test the new model and they changed the slider... The autoslider is fucked up, but the solver itself seems to work fine. f5eba491b2526081a17676a5fbeca877e31f302a

slabodan commented 1 month ago

I was just about to test the new model and they changed the slider... The autoslider is fucked up, but the solver itself seems to work fine. f5eba49

What board? Old slider still on /mu/

Dankan37 commented 1 month ago

They changed the captchas again, now they have a lot more noise, also that horizontal line, we may need newer samples... image

Yukariin commented 1 month ago

What board?

I tried on a few blue ones (v/vg/g/jp), but it disappeared quickly and now I get a mix of old captchas and new ones (with cropped letters). It's not new, I think I saw it on red boards a few months ago and the scripts have the code to solve it, but it sucks miserably.

Yukariin commented 1 month ago

@Dankan37 yep, that's the one I got. It's the new slider they used on /trash, I think.

slabodan commented 1 month ago

I'm playing around with it and this and it looks like this newest slider is being solved with the nuBestPos function

  async function slideCaptcha(tfgElement, tbgElement, sliderElement) {
    // get data uris for captcha back- and foreground
    const tbgUri = tbgElement.style.backgroundImage.slice(5, -2);
    const tfgUri = tfgElement.style.backgroundImage.slice(5, -2);

    // load foreground (image with holes)
    const igd = await getImageDataFromURI(tfgUri);
    // load background (image that gets slid)
    const sigd = await getImageDataFromURI(tbgUri);
    const slideWidth = sigd.width - igd.width;
    const opqCol = checkOpaquePixels(igd);
    let sliderPos;

    if (opqCol === null)
    {
        console.log("old");
        // get array with pixels of foreground
        // that we compare to background
        const chkArray = getBoundries(igd);
        // slide, compare and get best matching position
        sliderPos = getBestPos(sigd, chkArray, slideWidth);
    }
    else {
        console.log("new");
        sliderPos = nuBestPos(igd, sigd, opqCol, slideWidth);
    }
    // slide in the UI
    sliderElement.value = sliderPos;
    sliderElement.dispatchEvent(new Event('input'), { bubbles: true });
    return 0 - (sliderPos / 2);
  }

I set it to always use the classic slider function and it does more or less okay, got it right 4/4 times

Screenshot 2024-09-21 at 16 27 35 Screenshot 2024-09-21 at 16 24 02 Screenshot 2024-09-21 at 16 26 12 Screenshot 2024-09-21 at 16 27 42
Dankan37 commented 1 month ago

Did some testing, the model can solve the new captchas but it slides in the wrong place. On a proxy with no cookies using the code above: image Initial state image Solution if I press Solve (doesn't autoslide) image Solution if I shift the slider until it's on the exact solution and press solve.

This instead is using the code from the latest Github (.11) image Again if I slide into position and press solve: image

So hopefully it's only a matter of fixing the slider

slabodan commented 1 month ago

Has anyone even seen the old "new" /trash/ slider? I can't get it to show up now

Yukariin commented 1 month ago

Updated the slider solver. Works much better with new one. Might regress with old ones though. a4dfb109d14d95e4e3825c99e6fea90ccf8eb8c4

Yukariin commented 1 month ago

Solver is still missing sometimes where I have to coax it by removing 1-2 items

Yeah, the noise from the new slider seems to confuse it.

Yukariin commented 1 month ago

And they update in real-time kek...

Screenshot 2024-09-21 at 17 58 45
trip54654 commented 1 month ago

The mods are probably watching this, yet they're so retarded and incompetent that they can't even stop known spammers like ACK. They even let a thread archive with his posts undeleted: https://boards.4chan.org/a/thread/271211044 The mods are (I bet Microsoft doesn't like the N word). They will always lose.

JonseyJones commented 1 month ago

@Yukariin The option to auto-post after get captcha option would be cool, as a feature suggestion, like next to save captcha to have that option, off by default ofc.

For me the script works better than ever.

slabodan commented 1 month ago

I guess they added that grid to bg so the surrounding pixels can't be compared for the slider? Just playing around with some chatgpt generated denoising, mby that would work.

Screenshot 2024-09-21 at 18 47 05
trip54654 commented 1 month ago

Seeing some more of the new captchas, I have to say I find it extremely difficult to solve them manually. Praise Allah for AI.

slabodan commented 1 month ago

not sure yet if denoising helps feels like a 50/50 but its basically same without it too

JonseyJones commented 1 month ago

Got the bad slider and the noise and the script poops it's pants.

JonseyJones commented 1 month ago

So many bad captcha results from the script. This is a major problem.

Yukariin commented 1 month ago

I think I have fixed the slider. The noise still fucks up the solver occasionally though...

slabodan commented 1 month ago
Screenshot 2024-09-22 at 11 28 37

Tried removing the grid from the background and just reverting the slider functions back to what they were before and I'm getting ok results, but I have to run a python server where I remove the grid and idk what would happen if a captcha without a grid would show up

https://github.com/user-attachments/assets/7eac6411-2216-426e-a0bb-7a3bd28432d7

https://github.com/user-attachments/assets/151e23cf-f8e8-4f99-b909-ad063f530fbf

https://github.com/user-attachments/assets/39d7893c-4201-4aa0-9cf8-b161856b0719

slabodan commented 1 month ago

weird case that ive had twice so far - initial guess is incorrect but when I press solve it gets it correctly

https://github.com/user-attachments/assets/421c0fe1-067b-446d-88f6-5ebf79ebd9ff

JonseyJones commented 1 month ago

I think I have fixed the slider. The noise still fucks up the solver occasionally though...

No update found. 1.4.12

Last update, 15h ago.

a4dfb10 18 hours ago

Yukariin commented 1 month ago

@JonseyJones still testing

Yukariin commented 1 month ago

Okay, I think this one should work fine eeaae13ee8e569bbd969ef290205f7664cb935d0

MIAN1123 commented 1 month ago

image I made some changes to the slider scoring value, it's working fine

Dankan37 commented 1 month ago

image Just got this on a fresh proxy Second try, same results image

slabodan commented 1 month ago

Though I suspect that in the next few days or weeks they are gonna go live with this one #28

🙏