QIN2DIM / hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
https://docs.captchax.top/
GNU General Public License v3.0
1.45k stars 258 forks source link

[Question] The project is failing 90% of the time because of new challenges #965

Open tarekwiz opened 7 months ago

tarekwiz commented 7 months ago

Can you explain how to train new models because it's not visible anywhere? Maybe a small tutorial with examples would help a lot and many developers are going to be able to contribute to your project.

tarekwiz commented 7 months ago

Nevermind I see the workflow collecting captcha files and I also saw this project https://github.com/beiyuouo/hcaptcha-model-factory for automatic model creation, so why is this not a part of the workflow. Maybe we can use GPT for classifieng which classes are "yes" or "bad" @beiyuouo . What do you guys think?

QIN2DIM commented 7 months ago

You can perform the zero-shot image binary classification task by modifying the modelhub instance variable. You don't need to train the model yourself. CLIP can already handle all image classification tasks.

tarekwiz commented 7 months ago

I've tried it but it doesn't work well. Can you show me an example using it with "Click on the animals that don't walk" edit: maybe i'm using it incorrectly, hence why I asked for an example

tarekwiz commented 7 months ago

I found why. There's an error in your code components/common.py L55 Should be "ket in label" not "label in ket"

tarekwiz commented 7 months ago

@QIN2DIM I opened a pull request. Please merge

QIN2DIM commented 7 months ago

For CLIP, I have not provided a better reference case.

For CLIP prompt,I offer two trigger options.clip_candidates and datalake, at the moment I prefer datalake.

I plan to use clip_candidates to handle nested types of image binary classification tasks, i.e., challenges where the prompt is invariant but the positive sample can be constantly updated.

QIN2DIM commented 7 months ago

There are still some issues with clip_candidates, and I will subsequently change its data structure, which currently struggles to cope with the complex demands of prompt orchestration. I'd like it to be able to divide and conquer, and to further reduce inference pressure.

QIN2DIM commented 7 months ago

The default is to use the RESNET-50 model, which is behind the times in terms of inference performance.

I will write a script later to determine if current hardware resources can run a better performing CLIP model.

QIN2DIM commented 7 months ago

https://github.com/QIN2DIM/hcaptcha-challenger/blob/d5fa70a2c972293310e762761e9a168015f1185c/hcaptcha_challenger/onnx/modelhub.py#L295-L308

tarekwiz commented 7 months ago

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it. I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

QIN2DIM commented 7 months ago

I think we should work on a CI that converts collected information into CLIP candidates automatically so we don't have to keep track of them manualy.

You're right. That's what I want to do.

I have managed to get clip_candidates to work but I dislike how the first element of the array is the correct one. Nothing indicates that and I had to read the code to identify it.

I have provided a datalake orchestration method to orchestrate Positive and Negative for a specific challenge-prompt.

The current data structure for clip_candidates is failing miserably, and I'm already planning to replace it.

QIN2DIM commented 7 months ago

https://github.com/QIN2DIM/hcaptcha-challenger/blob/d5fa70a2c972293310e762761e9a168015f1185c/src/objects.yaml#L6

https://github.com/QIN2DIM/hcaptcha-challenger/blob/d5fa70a2c972293310e762761e9a168015f1185c/src/objects.yaml#L736

tarekwiz commented 7 months ago

@QIN2DIM PR has been approved but it still hasn't been updated on pypi

QIN2DIM commented 7 months ago

Updated

Actually, I rewrote clip_candidates about two weeks ago and also solved the current issue. However, that feature needs more testing, and I have been too busy recently.

harusurv commented 7 months ago

It's failing atm in every challenge :(