Xewdy444 / Playwright-reCAPTCHA

A Python library for solving reCAPTCHA v2 and v3 with Playwright
https://pypi.org/project/playwright-recaptcha/
MIT License
240 stars 33 forks source link

SyncSolver gets stuck in an infinite loop when the audio challange button remains disabled. #10

Closed Hyperz closed 1 year ago

Hyperz commented 1 year ago

I'm using the SyncSolver in headless mode (Firefox). When the audio challenge is disabled due to Your computer or network may be sending automated queries. To protect our users, we can't process your request right now. it gets stuck in an infinite loop here because the button never becomes enabled. I'm using v0.0.7.

Xewdy444 commented 1 year ago

So upon clicking the I'm not a robot checkbox, it gives you that error?

Hyperz commented 1 year ago

I think it tries to click the checkbox, but that triggers the image challenge so it then tries to solve the audio challenge, which in my case was disabled due to solving too many captchas.

Edit: perhaps worth noting is that I'm also setting cookies from regular Firefox in playwright such that I'm logged in to my Google account when solving the challenges to try and reduce the amount of times it locks me out of the audio challenges. Maybe that slightly changes the behavior of recaptcha in such instances?

Xewdy444 commented 1 year ago

Instead of checking for the audio challenge button being enabled, I changed it to only check if it is visible. This way, when the _get_audio_url() method is called, it will check for the rate limit message and throw a RecaptchaRateLimitError as expected. Try this out and see if this fixes the issue.

Hyperz commented 1 year ago

Just tested it with is_visible(). It's still getting stuck in an infinite loop there in headless mode. Even with the audio challenge being available this time. I tried to check out what's going on by turning headless mode off and it didn't get stuck in that loop but it still got stuck somewhere looking like this. The odd thing is everything was working fine for about 3 days until I started running into these issues out of nowhere today. I'm gonna do some more testing when I get the time.

Hyperz commented 1 year ago

Little bit of an update. For some reason I'm unable to reproduce the infinite loop bug. Possibly because I'm not being rate limited right now. So right now it's clicking the checkbox just fine, waits for the spinning animation to finish, and and the ✔️ to appear. But then Playwright throws an error:

  File "C:\Users\djete\PycharmProjects\pwcaptchatest\captchasolving.py", line 69, in solve_recaptcha_v2
    solution = solver.solve_recaptcha()
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\playwright_recaptcha\recaptchav2\sync_solver.py", line 281, in solve_recaptcha
    url = self._get_audio_url(recaptcha_frame)
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\playwright_recaptcha\recaptchav2\sync_solver.py", line 125, in _get_audio_url
    audio_challenge_button.click(force=True)
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\sync_api\_generated.py", line 15360, in click
    self._sync(
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_sync_base.py", line 104, in _sync
    return task.result()
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_locator.py", line 146, in click
    return await self._frame.click(self._selector, strict=True, **params)
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_frame.py", line 489, in click
    await self._channel.send("click", locals_to_params(locals()))
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 44, in send
    return await self._connection.wrap_api_call(
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 419, in wrap_api_call
    return await cb()
  File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 79, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Element is outside of the viewport
=========================== logs ===========================
waiting for get_by_role("button", name="Get an audio challenge")
  locator resolved to <button value="" disabled id="recaptcha-audio-button" ti…></button>
attempting click action
  waiting for element to be visible, enabled and stable
    forcing action
  element is visible, enabled and stable
  scrolling into view if needed
  done scrolling
============================================================

Some line numbers are off since I added a bunch of screenshot code for headless debugging. But since it got the green checkmark, shouldn't it have gotten the token without needing the audio challenge? I tried changing back to audio_challenge_button.is_enabled() in the loop and tried without headless mode and also get the same error.

Xewdy444 commented 1 year ago

Yea, I forgot about the reason I made that is_enabled() instead of is_visible(). If the reCAPTCHA box gets checked immediately upon clicking it (Without having to solve any kind of challenge) then it will return the g-recaptcha-response token immediately. If it is set to continue under the condition that it is visible, then this breaks that functionality.

Xewdy444 commented 1 year ago

I believe I found the issue. The locator used for identifying that the audio challenge was visible was not correct. For some reason though, it still worked for me. I updated the branch for this issue with the fix, so try it out and see if it fixes the issue you were having.

Hyperz commented 1 year ago

Just tested with the changes. It works sometimes, but sometimes I still run into the error mentioned in my last post. However, I think I know why now. The recent captchas it's been given me have been the easy JS ones where you just have to click the checkbox and wait for the green checkmark. In one instance, it broke out of the loop when the spinner was still active (right before the green checkmark appears) because audio_challenge_button.is_enabled() returned True, despite the fact that it wasn't visible and it was in the middle of solving the JS challenge. If I have time tomorrow I'm gonna test with this change to the loop:

counter = 0

while True:
    # If we're still going after 60+ seconds it's never happening.
    # Raise an error so the caller can deal with it.
    if counter >= 60:
        raise RecaptchaSolveError("Loop timeout.")

    # Wait first in case we get the easy JS challenge.
    # Takes around ~0.5-1.0 seconds for the green checkmark to appear.
    self._page.wait_for_timeout(1000)

    # Check this first in case it's the JS challenge.
    if recaptcha_checkbox.is_checked():
        if self.token is None:
            raise RecaptchaSolveError("Missing token.")

        return self.token

    # We probably got an image challenge instead?
    if audio_challenge_button.is_enabled():
        break

    counter += 1
Hyperz commented 1 year ago

So far so good with your changes and the above mentioned change to the loop. Haven't been rate limited yet though.

Xewdy444 commented 1 year ago

I don't understand how audio_challenge_button.is_enabled() would return True if the image challenge frame wasn't even visible yet.

Hyperz commented 1 year ago

Just looked at the Playwright docs and I was under the impression that an element had to be visible as well to be considered enabled. But apparently that's not the case. I guess checking both is_enabled() and is_visible() in that conditional would fix the issue as well.

Xewdy444 commented 1 year ago

Alright, I added the check for the audio challenge button being visible to that initial loop.

Hyperz commented 1 year ago

It stopped giving me the easy JS challenge today which has revealed another corner-case issue. As mentioned in the OP I pull cookies from my actual Firefox install in order to be signed into Google and get less rate-limiting. Normally the behavior of recaptcha when you're not getting the JS challenge is it pops up the the image challenge after clicking the checkbox, where you can then click the audio challenge button and solve that. But it seems that (at least when you're logged in to Google) after solving some audio challenges it defaults to it. Meaning that once you click the checkbox you immediately get the audio challenge page instead of first seeing the image challenge page: ApplicationFrameHost_iMBSb07P8y

This causes a timeout when checking if the audio challenge button is visible/enabled in that loop because that button isn't there in that situation. If it helps, I'm using browser-cookie3==0.17.0 to grab/set the cookies from my main Firefox profile. Example code:

from typing import List, Dict, Any

import browser_cookie3
from playwright.sync_api import sync_playwright
from playwright_recaptcha import recaptchav2

def get_firefox_cookies() -> List[Dict[str, Any]]:
    cookie_jar = browser_cookie3.firefox()
    cookies = [
        {
            'name': cookie.name,
            'value': cookie.value,
            'domain': cookie.domain,
            'path': cookie.path,
            'secure': bool(cookie.secure),
        }
        for cookie in cookie_jar
    ]

    return cookies

def solve_recaptcha_v2(url: str) -> None:
    with sync_playwright() as p:
        browser = p.firefox.launch(headless=False)
        context = browser.new_context(
            locale='en-US,en;q=0.5',
            ignore_https_errors=True,
        )
        context.clear_cookies()
        context.add_cookies(get_firefox_cookies())
        page = context.new_page()
        page.set_default_navigation_timeout(60_000)
        page.set_default_timeout(10_000)

        try:
            page.goto(url, wait_until='networkidle')

            with recaptchav2.SyncSolver(page) as solver:
                print(solver.solve_recaptcha())
        except Exception:
            raise
        finally:
            page.close()
            context.close()
            browser.close()
Xewdy444 commented 1 year ago

That is odd. I added the check for the audio challenge being visible immediately after clicking the reCAPTCHA checkbox, so this issue should be solved.

Hyperz commented 1 year ago

Thanks, I'll test it.

Hyperz commented 1 year ago

So far no issues with over 65 captchas solved. Haven't gotten rate-limited again yet though.

Xewdy444 commented 1 year ago

Any issues yet?

Hyperz commented 1 year ago

Nope. Well over a 100 captchas solved now with no issues. But again, no rate-limiting either. Might be best to close the issue for now.