QIN2DIM / hcaptcha-challenger

πŸ₯‚ Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
https://docs.captchax.top/
GNU General Public License v3.0
1.47k stars 260 forks source link

[Question] Error during prolonged execution #894

Closed 12189108 closed 8 months ago

12189108 commented 8 months ago

Brief description

I want to call the 'bytedance' method in 'demo_undetected_playwright.py' through Flask. I imported the file to make the initial call, and it was working fine at first. However, after a few hours (while the program was running in the background), I started encountering the error mentioned above when calling the 'bytedance' method through Flask. How can I resolve this?

Related logger

File "/root/hcaptcha/hcaptcha_solver.py", line 27, in hit_challenge
    result = await agent()
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 678, in __call__
    return await self.execute(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 746, in execute
    await self._binary_challenge_clip(frame_challenge)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 652, in _binary_challenge_clip
    results = tool(model, image=Image.open(self._img_paths[i + pth * 9]))
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/components/zero_shot_image_classifier.py", line 137, in __call__
    predictions = detector(image, candidate_labels=self.candidate_labels)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/onnx/clip.py", line 411, in __call__
    image_features = self.encode_image(images)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/onnx/clip.py", line 375, in encode_image
    input_name = self.visual_session.get_inputs()[0].name
AttributeError: 'NoneType' object has no attribute 'get_inputs'
QIN2DIM commented 8 months ago

What version of pypi are you using.

12189108 commented 8 months ago

What version of pypi are you using.

0.9.2.post1

QIN2DIM commented 8 months ago

image

emmm try to upgrade

QIN2DIM commented 8 months ago

I'd like to see how your interface calls hcap.

Did you start a new browser at all?

Or rather, I would like to know how AgentT was instantiated before result = await agent(). Is it exactly the same as the demo? Because you mentioned "import file" 。。。

QIN2DIM commented 8 months ago

Do you keep a log of runtime.log, I'd like to see what the context was running when the error was reported

https://github.com/QIN2DIM/hcaptcha-challenger/blob/638baff110de4ff4d5330d3f11ce2c5a188f552a/hcaptcha_challenger/agents/playwright/control.py#L631-L641

And, are you using a command like uvicorn or just running flask?

12189108 commented 8 months ago

api.py

import hashlib
import os
import asyncio
import uuid
import shutil
from flask import Flask, jsonify, request, logging as flog
from flask_limiter.util import get_remote_address
import hcaptcha_solver

app = Flask(__name__)

def get_ipaddr():
    if request.access_route:
        print(request.access_route[0])
        return request.access_route[0]
    else:
        return request.remote_addr or '127.0.0.1'

def generate_uuid():
    unique_identifier = str(uuid.uuid4())
    hashed_string = hashlib.sha256(unique_identifier.encode()).hexdigest()
    return hashed_string

@app.errorhandler(429)
def rate_limit_exceeded(e):
    print(get_remote_address())
    return jsonify(msg="Too many request"), 429

@app.route("/", methods=["GET"])
def index():
    return jsonify(status_code=200, ip=get_ipaddr())

@app.route("/api/solve", methods=["POST"])
def solve_captcha():
    require_data = ["host", "site_key"]
    data = request.get_json(force=True, silent=True)
    dir_path=generate_uuid()
    resp=asyncio.run(hcaptcha_solver.bytedance(data["host"], data["site_key"], dir_path))
    shutil.rmtree(dir_path)
    shutil.rmtree("tmp_dir")
    return resp

app.run(host="0.0.0.0", port=8081)

hcaptcha_solver.py

from pathlib import Path

import traceback
from loguru import logger
from playwright.async_api import BrowserContext as ASyncContext, async_playwright
import hcaptcha_challenger as solver
from hcaptcha_challenger.agents import AgentT, Malenia

# Init local-side of the ModelHub
solver.install(upgrade=True)

# Save dataset to current working directory
tmp_dir = Path(__file__).parent.joinpath("tmp_dir")

@logger.catch
async def hit_challenge(context: ASyncContext, host, sitekey, user_data_dir, times: int = 8):
    await context.route('**/*', lambda route, request: route_continuation(route, request, host, sitekey))
    page = context.pages[0]
    agent = AgentT.from_page(page=page, tmp_dir=tmp_dir,self_supervised=True)
    await page.goto(f"https://{host}")

    await agent.handle_checkbox()

    for pth in range(1, times):
        result = await agent()
        print(f">> {pth} - Challenge Result: {result}")
        match result:
            case agent.status.CHALLENGE_BACKCALL:
                await page.wait_for_timeout(500)
                fl = page.frame_locator(agent.HOOK_CHALLENGE)
                await fl.locator("//div[@class='refresh button']").click()
            case agent.status.CHALLENGE_SUCCESS:
                rqdata = agent.cr.__dict__
                await context.close()
                return rqdata["generated_pass_UUID"]

async def route_continuation(route, request, host, sitekey):
    if request.url == f"https://{host}/":
        print("start to solve")
        await route.fulfill(status=200,
                            body="""

<!DOCTYPE html>
<html lang="en">
<head>
<title>hCAPTCHA ζΌ”η€Ί</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, user-scalable=yes">
<script src="https://js.hcaptcha.com/1/api.js" type="text/javascript" async defer></script>
</head>
<body>
<br><br>
<div class="sample-form">
<form id="hcaptcha-demo-form" method="POST">
<div id="hcaptcha-demo" class="h-captcha" data-sitekey="%%%%%%%%%%%" data-callback="onSuccess" data-expired-callback="onExpire"></div>
<script>
                      // success callback
                      var onSuccess = function(response) {
                        var errorDivs = document.getElementsByClassName("hcaptcha-error");
                        if (errorDivs.length) {
                          errorDivs[0].className = "";
                        }
                        var errorMsgs = document.getElementsByClassName("hcaptcha-error-message");
                        if (errorMsgs.length) {
                          errorMsgs[0].parentNode.removeChild(errorMsgs[0]);
                        }

                        var logEl = document.querySelector(".hcaptcha-success");
                        logEl.innerHTML = "ζŒ‘ζˆ˜ζˆεŠŸοΌ"
                      };

                      var onExpire = function(response) {
                        var logEl = document.querySelector(".hcaptcha-success");
                        logEl.innerHTML = "δ»€η‰Œε·²θΏ‡ζœŸγ€‚"
                      };
                </script>

<div class="hcaptcha-success smsg" aria-live="polite"></div>
</body>
<script type="text/javascript">
    // beacon example
    function addEventHandler(object,szEvent,cbCallback){
        if(!!object.addEventListener){ // for modern browsers or IE9+
            return object.addEventListener(szEvent,cbCallback);
        }
        if(!!object.attachEvent){ // for IE <=8
            return object.attachEvent(szEvent,cbCallback);
        }
    };
    // Ex: triggers pageview beacon
    addEventHandler(window,'load',function(){b();});
    // Ex: triggers event beacon without pageview
    addEventHandler(window,'load',function(){b({"vt": "e", "ec": "test_cat", "ea": "test_action"});});
  </script>
</html>
            """.replace("%%%%%%%%%%%", sitekey))
    else:
        await route.continue_()

async def bytedance(host, sitekey, user_data_dirs):
    print(user_data_dirs)
    # playwright install firefox --with-deps
    try:
        async with async_playwright() as p:
            context = await p.firefox.launch_persistent_context(
                user_data_dir=Path(__file__).parent.joinpath(user_data_dirs),
                headless=True,
                locale="en-US"
            )
            await Malenia.apply_stealth(context)
            token = await hit_challenge(context, host, sitekey, Path(__file__).parent.joinpath(user_data_dirs))
            return token
    except Exception as e:
        traceback.print_exc()

log file

Installing models/objects.yaml: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.95k/2.95k [00:00<00:00, 3.97MB/s]
 * Serving Flask app 'api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8081
 * Running on http://*.*.*.*:8081
Press CTRL+C to quit
0e930108edfc205ce3fd42f7d1453611b8c7797d48dec74d4d863a102ccc83fe
start to solve
2023-10-28 14:25:43 | DEBUG - match model - {'resnet': 'furniture2310.onnx', 'prompt': 'Please click each image containing furniture'}
>> 1 - Challenge Result: success
127.0.0.1 - - [28/Oct/2023 14:25:50] "POST /api/solve HTTP/1.0" 200 -
bbf9389ed379302efb6d32cea46a994a13d73fb8637abe121c5a710cd1f83718
start to solve
2023-10-28 23:32:16 | DEBUG - unsupervised - {'type': 'binary', 'candidate_labels': ['This is a photo of the the largest animal.', 'This is a photo that has nothing to do with the largest animal.'], 'prompt': 'Please click on each image containing the largest animal.', 'timit': '0.250s'}
2023-10-28 23:32:17 | ERROR - An error has been caught in function 'bytedance',
process 'MainProcess' (545613), thread 'Thread-4 (process_request_thread)' (140103913567808): - {}
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 390,
in handle
    super().handle()
  File "/usr/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 362,
in run_wsgi
    execute(self.server.app)
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 323,
in execute
    application_iter = app(environ, start_response)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1478, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/root/hcaptcha/api.py", line 84, in solver_captcha
    resp=asyncio.run(hcaptcha_solver.bytedance(data["host"], data["site_key"], dir_path))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/root/hcaptcha/hcaptcha_solver.py", line 118, in bytedance
    token = await hit_challenge(context, host, sitekey, Path(__file__).parent.joinpath(user_data_dirs))
  File "/root/hcaptcha/hcaptcha_solver.py", line 27, in hit_challenge
    result = await agent()
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 678, in __call__
    return await self.execute(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 746, in execute
    await self._binary_challenge_clip(frame_challenge)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/agents/playwright/control.py", line 652, in _binary_challenge_clip
    results = tool(model, image=Image.open(self._img_paths[i + pth * 9]))
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/components/zero_shot_image_classifier.py", line 137, in __call__
    predictions = detector(image, candidate_labels=self.candidate_labels)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/onnx/clip.py", line 411, in __call__
    image_features = self.encode_image(images)
  File "/usr/local/lib/python3.10/dist-packages/hcaptcha_challenger/onnx/clip.py", line 375, in encode_image
    input_name = self.visual_session.get_inputs()[0].name
AttributeError: 'NoneType' object has no attribute 'get_inputs'
[2023-10-28 23:32:17,981] ERROR in app: Exception on /api/solve [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 870, in full_dispatch_request
    return self.finalize_request(rv)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 889, in finalize_request
    response = self.make_response(rv)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1161, in make_response
    raise TypeError(
TypeError: The view function for 'solve_captcha' did not return a valid response. The function either returned None or ended without a return statement.
127.0.0.1 - - [28/Oct/2023 23:32:17] "POST /api/solve HTTP/1.0" 500 -
12189108 commented 8 months ago

image

emmm try to upgrade

I will try it in the morning. Thank you

QIN2DIM commented 8 months ago

This is a very useful case. I can offer you a solution with better performance.

12189108 commented 8 months ago

After upgrading, there haven't been any issues so far, except for 'The remote network does not exist or the local cache has expired.' - {}. However, it returned to normal after I called solver.install(upgrade=True). Can you please provide more details about 'a solution with better performance'?

QIN2DIM commented 8 months ago

I am planning to split the browser driver and computer vision modules.

With this split you can replace proxies and change browser fingerprints more efficiently, while the CV part can communicate with other services via grpc / http.

This way it saves the time of repeatedly reading and releasing the model. The CV service can process hundreds of images per second (of course this depends on the performance of your device, but it's even better if you have a GPU).

And you can create as many browser-driven instances as you want based on network requests.

Of course, any transition has a cost.

It may require stronger device performance.

This is because models that have been read are not actively freed by the backend service, and reading so many models at once consumes a lot of memory.

QIN2DIM commented 8 months ago

At the same time, a backend service is available to fetch objects.yaml and PyPI version changes on a scheduler timer.

12189108 commented 8 months ago

Looking forward to your next version; after the update, the previous issues have not reappeared. I only added the 'clip=True' parameter. Additionally, I included relevant methods to allow me to avoid the 'The remote network does not exist or the local cache has expired' error by calling 'solver.install(upgrade=True)' through Flask.