QIN2DIM / hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
https://docs.captchax.top/
GNU General Public License v3.0
1.48k stars 253 forks source link

examples of `appears_only_once` #844

Closed QIN2DIM closed 10 months ago

QIN2DIM commented 10 months ago
          I'm sorry, do you have an example of how this works, such as the code demos from your guide.

Решение, которое я раньше использовал, возникло в процессе общения с GPT, и я попробовал использовать инфраструктуру llm в различных ссылках.

Другими словами, я надеюсь, что в будущем смогу найти способ разгадать все капчи в мире с помощью LLM.

Но в настоящее время я очень рад тому, что GPT может достичь такого уровня в однотекстовом режиме, но метод точности, который он дает, на самом деле недостаточно высок.

Поэтому я использовал@beiyuouo Модель сегмента YOLOv8 + метод расчета матрицы сходства для решения этой задачи.

I'm sorry, do you have an example of how this works, such as the code demos from your guide.

Originally posted by @PieceOfGood in https://github.com/QIN2DIM/hcaptcha-challenger/issues/806#issuecomment-1770256888

QIN2DIM commented 10 months ago

@PieceOfGood

image

I've merged an example that I hope you'll find useful

https://github.com/QIN2DIM/hcaptcha-challenger/blob/main/examples/demo_find_unique_object.py

PieceOfGood commented 10 months ago

Wow! Excellent example! Thanks for this experience! But as you may have noticed, my first request was with a slightly different type of task. Please click on the thumbnail of something that can be eaten

And “find a unique object” - in theory it also fits here, but something went wrong: i01

QIN2DIM commented 10 months ago

I thought you had read the source code. In this case, we used other models to detect it.😅

image

PieceOfGood commented 10 months ago

I'm not sure you understand me correctly. Perhaps I'm not describing my problem accurately enough.

Perhaps everything is intuitive to you personally, and if it were different, then most likely you would not be the author of this tool. But this does not mean that everything is as clear to everyone as you are, or that everyone will understand what you are guided by if read your source code.

I came to you with a very specific question. Last time you provided a link, which I take as a guide to action. I looked at the intentions of the algorithm described in the link and simplified it to processing one single image:

import os
import sys
from pathlib import Path
from typing import Callable

import cv2

import hcaptcha_challenger as solver
from hcaptcha_challenger.components.cv_toolkit.appears_only_once import (
    limited_radius,
    annotate_objects,
    find_unique_object,
    find_unique_color,
)
from hcaptcha_challenger.onnx.modelhub import ModelHub
from hcaptcha_challenger.onnx.yolo import YOLOv8Seg

solver.install(upgrade=True)

# Initialize model index
modelhub = ModelHub.from_github_repo()
modelhub.parse_objects()

def draw_unique_object(results, image_path: str, trident: Callable):
    def search():
        # Hough circle detection as a bottoming scheme
        img, circles = annotate_objects(image_path)

        # Prioritize the results of seg model cutting
        # IF NOT results - Use Hough's results
        if results:
            # circle: [center_x, center_y, r]
            circles = [
                [int(result[1][0]), int(result[1][1]), limited_radius(img)] for result in results
            ]

        # Find the circle field of the `appears only once`
        if circles:
            if result := trident(img, circles):
                x, y, _ = result
                # Return the `Position` DictType[str, int]
                return img, {"x": int(x), "y": int(y)}

        # `trident() method` returned untrusted results
        return img, {}

    # IF position is reliable - Mark the center of a unique circle
    image, position = search()
    # print(position)

    # If you are writing a playwright or selenium program,
    # you should click canvas according to this coordinate.
    if position:
        combined_img = cv2.circle(
            image, (position["x"], position["y"]), limited_radius(image), (255, 0, 0), 2
        )
        return combined_img
    return image

def execute(image_path: Path, trident: Callable, output_dir: Path | None = None) -> None:
    # Load model (automatic download, 51MB)
    model_name = "appears_only_once_2309_yolov8s-seg.onnx"
    classes = modelhub.ashes_of_war.get(model_name)
    session = modelhub.match_net(model_name)
    yoloseg = YOLOv8Seg.from_pluggable_model(session, classes)

    # Find all the circles in the picture
    results = yoloseg(image_path)

    # Find the unique circle and draw the center of the circle
    combined_img = draw_unique_object(results, str(image_path), trident)

    # Draw a bounding box and mask region for all circles
    combined_img = yoloseg.draw_masks(combined_img, mask_alpha=0.1)

    if not output_dir:
        if not (output_dir := Path(__file__).parent / "output_dir").exists():
            output_dir.mkdir(parents=True)

    # Preserve pictures with traces of drawing
    output_path = output_dir.joinpath(image_path.name)
    cv2.imwrite(str(output_path), combined_img)

    if "win32" in sys.platform:
        os.startfile(output_dir)
    print(f">> View at {output_dir}")

def demo():

    image_path = Path(__file__).parent / "i01.png"
    execute(image_path, find_unique_object)

# pip install -U hcaptcha_challenger
if __name__ == "__main__":
    demo()

And this would really be a guide for me if it solved my problem. I would take the trouble to track all imports in all namespaces in PyCharm via Ctrl+Click and google all other specifics. But it doesn't work.

How do I know what exactly is not working and at what stage? This is not my code example. For me, it is still like an incomprehensible concept with an undesignated entry point. And why, for example, in the last answer, did you think it was better to send a picture demonstrating the result of solving my problem, and not a link to the code that does exactly this?

If speaking about “other models” you mean indicating a specific name in this line, then for me, for example, its source is not at all obvious.

Is the entire list available somewhere? Why don't you consider defining them as constants, like class fields, or in a separate file? A declaration in the form const.appears_only_once would provide more clarity than a string whose existence and format only you know.

I assume you are making this tool for the general public, and not just for yourself and those who are dedicated to the intricacies of working with neural networks, right? If so, then please make allowance for the fact that your undocumented intentions may not be obvious to those who see the possibility of solving their problem in your tool, but do not know whether to pick it up, or it is still necessary for Thursday to come and the moon should be in constellation Sagittarius.

I apologize for this message, because I know how unpleasant criticism is, but I still hope to reach the finish line along this path and, no less important, leave it for those who follow.

QIN2DIM commented 10 months ago

@PieceOfGood Hi

I've had so much todoList lately that I haven't had much time to work on this project.

The issues you mention are very relevant, and it's something I've been wanting to do for a while now regarding project stability. It also really keeps the barrier to using the program lower。

But I was never sure of the audience that would use the tool, so I drew up a number of design processes that never formally made it to the development stage.

↓ This is a bit more complicated to explain, so let's start at the beginning.🦉

QIN2DIM commented 10 months ago

It's going to take a bit of time to explain how this system works, because so far I haven't authored a formal official document.🥹

Let me answer your question first

Long story short, the most critical system processes of AgnetT.execute() are as follows:

1. Handler

The AgenT parses the packet and resolves two key fields: request_type and requester_question.

AgnetT then transforms the entire json packet into a dataclass for use by subsequent processes.

AgentT passes the transformed object into asyncio.Queue so that the child threads can communicate with the main thread.

image

2. request_type and prompt

After AgentT gets the prompt, it encodes and cleans and slices it to get label, i.e. AgentT._label. This should be a noun phrase, but it doesn't matter if it isn't.

After that, AgenT performs task triage based on request_type, and the choice of which solution to use, and which model to invoke to solve what type of task, is defined in advance in the decision flow below.

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/hcaptcha_challenger/agents/playwright/control.py#L734-L762

In this case, self.status.CHALLENGE_BACKCALL is a return signal indicating that the current challenge could not be solved, and the external code needs to catch this signal and try to refresh the challenge. This is the code shown in the demo:

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/examples/demo_normal_playwright.py#L38-L48

Otherwise, AgentT executes the challenge and waits for hCaptcha to return a response:

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/hcaptcha_challenger/agents/playwright/control.py#L766-L767

There are generally only two statuses here, i.e. success or failure, but here we translate "failure" into "retry", i.e. the external program needs to continue refreshing the challenge.

QIN2DIM commented 10 months ago

Having written this, I can start by answering your question briefly.

Namely, it is up to the program to decide which model to use to solve which task. The models I train with data are very specific, and they can generally only handle the tasks I want them to handle.

However, as the training data is iterated and accumulated, the models can slowly develop the ability to recognize across domains. For example, to recognize the head of an animal, I didn't include the head of the rabbit label inside the dataset, but some models just recognize it, and while it may think it's the head of the hamster, it's valid for our task.

In other words, unless it's a specific type of task, I myself don't know which model would be most effective in solving the challenge at hand.

In your case, for example, I only trained one model for cutting out circles from the background, i.e. "appears_only_once_2309_yolov8s-seg.onnx", so I hardcoded it.

But for the task somthing that canbe eaten, we use a model filtered by the following strategy:

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/hcaptcha_challenger/agents/playwright/control.py#L747-L762

The program goes to the strategy await self._keypoint_default_challenge(frame_challenge) of L753

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/hcaptcha_challenger/agents/playwright/control.py#L521-L533

Finally, it will use the lookup_objects() strategy to filter the model. Its iterator is returned by a function called lookup_ash_of_war L532.

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/hcaptcha_challenger/onnx/modelhub.py#L412-L447

This is a rather redundant piece of code, but it works. It's a very simple process of matching models based on the model name I wrote.

QIN2DIM commented 10 months ago

You can find all available models in this objects.yaml file

https://github.com/QIN2DIM/hcaptcha-challenger/blob/dea62e44a6539474b18a11cfb8f79afff1cd1148/src/objects.yaml#L29-L30

In your case, the program ultimately chooses this model can_be_eaten_2312_yolov8s.onnx for the challenge based on the request_type as well as the prompt

In other words, in your case, although the challenge something can be eaten has only one object for the correct answer in the image, which is consistent with the meaning expressed by appears only once, we chose to handle it with a different set of solutions.

I started with appears only once, which is AgenT's solution for this type of challenge, i.e., detecting the "circle" first, and then detecting the "only circle". Obviously, this does not mean that AgentT will use this solution for other types of challenges.

Because for the model can_be_eaten_2312_yolov8s.onnx it's straightforward to know what can be eaten is in the picture.

Because you picked an image that doesn't fit the scheme and executed it, you get "unexpected results".

QIN2DIM commented 10 months ago

If you have any more questions you'd like to ask, go ahead and leave a comment.