gbiz123 / tiktok-captcha-solver

Selenium and Playwright client for SadCaptcha TikTok Captcha Solver API
20 stars 4 forks source link

Check captcha status #7

Closed serozhenka closed 2 months ago

serozhenka commented 2 months ago

Not sure when things changed, but, as far I see, _check_captcha_status no longer works properly.

As I concluded when debugging, TikTok no longer appends "Verification complete" anywhere on the page, hence, the xpath=//*[contains(text(), 'Verification complete')] selector yields no results. I solved it locally by intercepting all incoming responses in the network tab and looking for a successful verification response. Attaching the part of the code for async playwright that's aimed to automate the process described:

async def response_callback(response: Response):
    try:
        content = await response.text()
        content = json.loads(content)
    except Exception:
        return

    vf_complete = (
        content.get("message") == "Verification complete"
        and content.get("code") == 200
        and content.get("data") is None
        and content.get("msg_sub_code") == "success"
    )

    if vf_complete:
        print("COMPLETE", content)

page.on("response", response_callback)
await sadcaptcha.solve_captcha_if_present()
page.remove_listener("response", response_callback)

The fix will really be useful for us for two main reasons:

We wish to resolve this quickly, so we can use the library with the proper verification logic. I am not married to the proposed approach and saying it's perfect, so if you have any other ideas or suggestions, feel free to let me know. Also, let me know if attaching the fully working and self-contained demo script will be beneficial.

Thanks!

gbiz123 commented 2 months ago

I was not able to recreate the issue, it is possible that TikTok is rolling out this release gradually. I was able to push an update that should make the verification check more robust based on success/failure css styles applied after solving.

You can install the latest version using pip install --upgrade tiktok-captcha-solver

Now using success and failure selectors:

    async def _check_captcha_success(self) -> bool:
        success_selector = "css=.captcha_verify_message-pass"
        failure_selector = "css=.captcha_verify_message-fail"
        success_xpath = "xpath=//*[contains(text(), 'Verification complete')]"
        for _ in range(40):
            if await self.page.locator(failure_selector).all():
                logging.debug("Captcha not solved - failure selector present")
                return False
            if await self.page.locator(success_selector).all():
                logging.debug("Captcha solved - success selector present")
                return True
            if await self.page.locator(success_xpath).all():
                logging.debug("Captcha solved - success xpath present")
                return True
            await asyncio.sleep(0.5)
        logging.debug("Captcha not solved")
        return False
serozhenka commented 2 months ago

Unfortunately, it still didn't solve the issue for me.

Maybe it works in some cases, but most of the time TikTok only shows this success / failed verification message for 0.1-0.2s where you can be just sleeping in await asyncio.sleep(0.5) and so missing the time when the condition is satisfied. I am using Chromium Version 127.0.6533.17 (Developer Build) (arm64) and for me after the successful verification TikTok just clears the div with id #tiktok-verify-ele as shown on the image. Can you please add it to the conditions you check in the _check_captcha_success method?

Additional note, the playwright-stealth package you suggest using for better scraping is surely outdated and not maintained, hence doesn't work properly.

image

gbiz123 commented 2 months ago

Unfortunately, it still didn't solve the issue for me.

Maybe it works in some cases, but most of the time TikTok only shows this success / failed verification message for 0.1-0.2s where you can be just sleeping in await asyncio.sleep(0.5) and so missing the time when the condition is satisfied. I am using Chromium Version 127.0.6533.17 (Developer Build) (arm64) and for me after the successful verification TikTok just clears the div with id #tiktok-verify-ele as shown on the image. Can you please add it to the conditions you check in the _check_captcha_success method?

Additional note, the playwright-stealth package you suggest using for better scraping is surely outdated and not maintained, hence doesn't work properly.

image

It seems the best route would be to use a wait instead of polling, because you are right. It could miss the selector. I can implement this today.

Reopening this issue as my patch did not solve it. In the meantime, I invite you to make a pull request to contribute your patch.

gbiz123 commented 2 months ago

@serozhenka

Just patched AsyncPlaywrightSolver and PlaywrightSolver to use expect() instead of polling at 0.5s intervals. Please get the latest version with pip install --upgrade tiktok-captcha-solver and let me know how it works for you. If we still have issues, it could be to TikTok showing different things at different geolocations, which we will need to collaborate more on.

serozhenka commented 2 months ago

Thanks! Looks like it won't fix the issue, but it's a much cleaner way to conduct this check rather than polling, agree. I will later test on 🇺🇸 geo and let you know if there are different results compared to 🇺🇦

gbiz123 commented 2 months ago

Thanks! Looks like it won't fix the issue, but it's a much cleaner way to conduct this check rather than polling, agree. I will later test on 🇺🇸 geo and let you know if there are different results compared to 🇺🇦

Heres what you can do

Open chrome dev tools, and press ctrl+shift+p to run a command, and type Disable javascript. Don't press enter or click it yet. Then solve the captcha. Immediately after solving the captcha, press Disable javascript. Then, the captcha will freeze in the success state and you will be able to see exactly which classes are applied when a success occurs. Do the same for failure.

When you identify these classes, I can add them to the expect() code.

serozhenka commented 2 months ago

@gbiz123 just created a PR.

I have done what you proposed and by disabling javascript, I was able to get an element with the class of captcha_verify_message-pass, but it does disappear really quickly, and neither polling nor playwright assertions can catch it in time.

For me, the check I have introduced seems to solve the issue perfectly

serozhenka commented 2 months ago

Closing this as it was resolved in #8