Closed QIN2DIM closed 10 months ago
@PieceOfGood
I've merged an example that I hope you'll find useful
https://github.com/QIN2DIM/hcaptcha-challenger/blob/main/examples/demo_find_unique_object.py
Wow!
Excellent example! Thanks for this experience!
But as you may have noticed, my first request was with a slightly different type of task.
Please click on the thumbnail of something that can be eaten
And “find a unique object” - in theory it also fits here, but something went wrong:
I thought you had read the source code. In this case, we used other models to detect it.😅
I'm not sure you understand me correctly. Perhaps I'm not describing my problem accurately enough.
Perhaps everything is intuitive to you personally, and if it were different, then most likely you would not be the author of this tool. But this does not mean that everything is as clear to everyone as you are, or that everyone will understand what you are guided by if read your source code.
I came to you with a very specific question. Last time you provided a link, which I take as a guide to action. I looked at the intentions of the algorithm described in the link and simplified it to processing one single image:
import os
import sys
from pathlib import Path
from typing import Callable
import cv2
import hcaptcha_challenger as solver
from hcaptcha_challenger.components.cv_toolkit.appears_only_once import (
limited_radius,
annotate_objects,
find_unique_object,
find_unique_color,
)
from hcaptcha_challenger.onnx.modelhub import ModelHub
from hcaptcha_challenger.onnx.yolo import YOLOv8Seg
solver.install(upgrade=True)
# Initialize model index
modelhub = ModelHub.from_github_repo()
modelhub.parse_objects()
def draw_unique_object(results, image_path: str, trident: Callable):
def search():
# Hough circle detection as a bottoming scheme
img, circles = annotate_objects(image_path)
# Prioritize the results of seg model cutting
# IF NOT results - Use Hough's results
if results:
# circle: [center_x, center_y, r]
circles = [
[int(result[1][0]), int(result[1][1]), limited_radius(img)] for result in results
]
# Find the circle field of the `appears only once`
if circles:
if result := trident(img, circles):
x, y, _ = result
# Return the `Position` DictType[str, int]
return img, {"x": int(x), "y": int(y)}
# `trident() method` returned untrusted results
return img, {}
# IF position is reliable - Mark the center of a unique circle
image, position = search()
# print(position)
# If you are writing a playwright or selenium program,
# you should click canvas according to this coordinate.
if position:
combined_img = cv2.circle(
image, (position["x"], position["y"]), limited_radius(image), (255, 0, 0), 2
)
return combined_img
return image
def execute(image_path: Path, trident: Callable, output_dir: Path | None = None) -> None:
# Load model (automatic download, 51MB)
model_name = "appears_only_once_2309_yolov8s-seg.onnx"
classes = modelhub.ashes_of_war.get(model_name)
session = modelhub.match_net(model_name)
yoloseg = YOLOv8Seg.from_pluggable_model(session, classes)
# Find all the circles in the picture
results = yoloseg(image_path)
# Find the unique circle and draw the center of the circle
combined_img = draw_unique_object(results, str(image_path), trident)
# Draw a bounding box and mask region for all circles
combined_img = yoloseg.draw_masks(combined_img, mask_alpha=0.1)
if not output_dir:
if not (output_dir := Path(__file__).parent / "output_dir").exists():
output_dir.mkdir(parents=True)
# Preserve pictures with traces of drawing
output_path = output_dir.joinpath(image_path.name)
cv2.imwrite(str(output_path), combined_img)
if "win32" in sys.platform:
os.startfile(output_dir)
print(f">> View at {output_dir}")
def demo():
image_path = Path(__file__).parent / "i01.png"
execute(image_path, find_unique_object)
# pip install -U hcaptcha_challenger
if __name__ == "__main__":
demo()
And this would really be a guide for me if it solved my problem. I would take the trouble to track all imports in all namespaces in PyCharm via Ctrl+Click
and google all other specifics. But it doesn't work.
How do I know what exactly is not working and at what stage? This is not my code example. For me, it is still like an incomprehensible concept with an undesignated entry point. And why, for example, in the last answer, did you think it was better to send a picture demonstrating the result of solving my problem, and not a link to the code that does exactly this?
If speaking about “other models” you mean indicating a specific name in this line, then for me, for example, its source is not at all obvious.
Is the entire list available somewhere? Why don't you consider defining them as constants, like class fields, or in a separate file? A declaration in the form const.appears_only_once
would provide more clarity than a string whose existence and format only you know.
I assume you are making this tool for the general public, and not just for yourself and those who are dedicated to the intricacies of working with neural networks, right? If so, then please make allowance for the fact that your undocumented intentions may not be obvious to those who see the possibility of solving their problem in your tool, but do not know whether to pick it up, or it is still necessary for Thursday to come and the moon should be in constellation Sagittarius.
I apologize for this message, because I know how unpleasant criticism is, but I still hope to reach the finish line along this path and, no less important, leave it for those who follow.
@PieceOfGood Hi
I've had so much todoList lately that I haven't had much time to work on this project.
The issues you mention are very relevant, and it's something I've been wanting to do for a while now regarding project stability. It also really keeps the barrier to using the program lower。
But I was never sure of the audience that would use the tool, so I drew up a number of design processes that never formally made it to the development stage.
↓ This is a bit more complicated to explain, so let's start at the beginning.🦉
It's going to take a bit of time to explain how this system works, because so far I haven't authored a formal official document.🥹
Let me answer your question first
Long story short, the most critical system processes of AgnetT.execute()
are as follows:
The AgenT parses the packet and resolves two key fields: request_type
and requester_question
.
AgnetT then transforms the entire json packet into a dataclass for use by subsequent processes.
AgentT passes the transformed object into asyncio.Queue so that the child threads can communicate with the main thread.
request_type
specifies the type of challengerequester_question.en
specifies the prompt word for the challenge. Here we call it prompt
.After AgentT gets the prompt, it encodes and cleans and slices it to get label
, i.e. AgentT._label
. This should be a noun phrase, but it doesn't matter if it isn't.
After that, AgenT performs task triage based on request_type
, and the choice of which solution to use, and which model to invoke to solve what type of task, is defined in advance in the decision flow below.
In this case, self.status.CHALLENGE_BACKCALL
is a return signal indicating that the current challenge could not be solved, and the external code needs to catch this signal and try to refresh the challenge. This is the code shown in the demo:
Otherwise, AgentT executes the challenge and waits for hCaptcha to return a response:
There are generally only two statuses here, i.e. success or failure, but here we translate "failure" into "retry", i.e. the external program needs to continue refreshing the challenge.
Having written this, I can start by answering your question briefly.
Namely, it is up to the program to decide which model to use to solve which task. The models I train with data are very specific, and they can generally only handle the tasks I want them to handle.
However, as the training data is iterated and accumulated, the models can slowly develop the ability to recognize across domains. For example, to recognize the head of an animal, I didn't include the head of the rabbit
label inside the dataset, but some models just recognize it, and while it may think it's the head of the hamster
, it's valid for our task.
In other words, unless it's a specific type of task, I myself don't know which model would be most effective in solving the challenge at hand.
In your case, for example, I only trained one model for cutting out circles from the background, i.e. "appears_only_once_2309_yolov8s-seg.onnx"
, so I hardcoded it.
But for the task somthing that canbe eaten
, we use a model filtered by the following strategy:
The program goes to the strategy await self._keypoint_default_challenge(frame_challenge)
of L753
Finally, it will use the lookup_objects()
strategy to filter the model. Its iterator is returned by a function called lookup_ash_of_war
L532.
This is a rather redundant piece of code, but it works. It's a very simple process of matching models based on the model name I wrote.
You can find all available models in this objects.yaml file
In your case, the program ultimately chooses this model can_be_eaten_2312_yolov8s.onnx
for the challenge based on the request_type
as well as the prompt
In other words, in your case, although the challenge something can be eaten
has only one object for the correct answer in the image, which is consistent with the meaning expressed by appears only once
, we chose to handle it with a different set of solutions.
I started with appears only once
, which is AgenT's solution for this type of challenge, i.e., detecting the "circle" first, and then detecting the "only circle". Obviously, this does not mean that AgentT will use this solution for other types of challenges.
Because for the model can_be_eaten_2312_yolov8s.onnx
it's straightforward to know what can be eaten is in the picture.
Because you picked an image that doesn't fit the scheme and executed it, you get "unexpected results".
If you have any more questions you'd like to ask, go ahead and leave a comment.
I'm sorry, do you have an example of how this works, such as the code demos from your guide.
Originally posted by @PieceOfGood in https://github.com/QIN2DIM/hcaptcha-challenger/issues/806#issuecomment-1770256888