Closed Hyperz closed 1 year ago
So upon clicking the I'm not a robot
checkbox, it gives you that error?
I think it tries to click the checkbox, but that triggers the image challenge so it then tries to solve the audio challenge, which in my case was disabled due to solving too many captchas.
Edit: perhaps worth noting is that I'm also setting cookies from regular Firefox in playwright such that I'm logged in to my Google account when solving the challenges to try and reduce the amount of times it locks me out of the audio challenges. Maybe that slightly changes the behavior of recaptcha in such instances?
Instead of checking for the audio challenge button being enabled, I changed it to only check if it is visible. This way, when the _get_audio_url()
method is called, it will check for the rate limit message and throw a RecaptchaRateLimitError as expected. Try this out and see if this fixes the issue.
Just tested it with is_visible()
. It's still getting stuck in an infinite loop there in headless mode. Even with the audio challenge being available this time. I tried to check out what's going on by turning headless mode off and it didn't get stuck in that loop but it still got stuck somewhere looking like this. The odd thing is everything was working fine for about 3 days until I started running into these issues out of nowhere today. I'm gonna do some more testing when I get the time.
Little bit of an update. For some reason I'm unable to reproduce the infinite loop bug. Possibly because I'm not being rate limited right now. So right now it's clicking the checkbox just fine, waits for the spinning animation to finish, and and the ✔️ to appear. But then Playwright throws an error:
File "C:\Users\djete\PycharmProjects\pwcaptchatest\captchasolving.py", line 69, in solve_recaptcha_v2
solution = solver.solve_recaptcha()
File "C:\Users\djete\PycharmProjects\pwcaptchatest\playwright_recaptcha\recaptchav2\sync_solver.py", line 281, in solve_recaptcha
url = self._get_audio_url(recaptcha_frame)
File "C:\Users\djete\PycharmProjects\pwcaptchatest\playwright_recaptcha\recaptchav2\sync_solver.py", line 125, in _get_audio_url
audio_challenge_button.click(force=True)
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\sync_api\_generated.py", line 15360, in click
self._sync(
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_sync_base.py", line 104, in _sync
return task.result()
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_locator.py", line 146, in click
return await self._frame.click(self._selector, strict=True, **params)
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_frame.py", line 489, in click
await self._channel.send("click", locals_to_params(locals()))
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 44, in send
return await self._connection.wrap_api_call(
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 419, in wrap_api_call
return await cb()
File "C:\Users\djete\PycharmProjects\pwcaptchatest\venv\lib\site-packages\playwright\_impl\_connection.py", line 79, in inner_send
result = next(iter(done)).result()
playwright._impl._api_types.Error: Element is outside of the viewport
=========================== logs ===========================
waiting for get_by_role("button", name="Get an audio challenge")
locator resolved to <button value="" disabled id="recaptcha-audio-button" ti…></button>
attempting click action
waiting for element to be visible, enabled and stable
forcing action
element is visible, enabled and stable
scrolling into view if needed
done scrolling
============================================================
Some line numbers are off since I added a bunch of screenshot code for headless debugging. But since it got the green checkmark, shouldn't it have gotten the token without needing the audio challenge? I tried changing back to audio_challenge_button.is_enabled()
in the loop and tried without headless mode and also get the same error.
Yea, I forgot about the reason I made that is_enabled()
instead of is_visible()
. If the reCAPTCHA box gets checked immediately upon clicking it (Without having to solve any kind of challenge) then it will return the g-recaptcha-response
token immediately. If it is set to continue under the condition that it is visible, then this breaks that functionality.
I believe I found the issue. The locator used for identifying that the audio challenge was visible was not correct. For some reason though, it still worked for me. I updated the branch for this issue with the fix, so try it out and see if it fixes the issue you were having.
Just tested with the changes. It works sometimes, but sometimes I still run into the error mentioned in my last post. However, I think I know why now. The recent captchas it's been given me have been the easy JS ones where you just have to click the checkbox and wait for the green checkmark. In one instance, it broke out of the loop when the spinner was still active (right before the green checkmark appears) because audio_challenge_button.is_enabled()
returned True
, despite the fact that it wasn't visible and it was in the middle of solving the JS challenge. If I have time tomorrow I'm gonna test with this change to the loop:
counter = 0
while True:
# If we're still going after 60+ seconds it's never happening.
# Raise an error so the caller can deal with it.
if counter >= 60:
raise RecaptchaSolveError("Loop timeout.")
# Wait first in case we get the easy JS challenge.
# Takes around ~0.5-1.0 seconds for the green checkmark to appear.
self._page.wait_for_timeout(1000)
# Check this first in case it's the JS challenge.
if recaptcha_checkbox.is_checked():
if self.token is None:
raise RecaptchaSolveError("Missing token.")
return self.token
# We probably got an image challenge instead?
if audio_challenge_button.is_enabled():
break
counter += 1
So far so good with your changes and the above mentioned change to the loop. Haven't been rate limited yet though.
I don't understand how audio_challenge_button.is_enabled()
would return True
if the image challenge frame wasn't even visible yet.
Just looked at the Playwright docs and I was under the impression that an element had to be visible as well to be considered enabled. But apparently that's not the case. I guess checking both is_enabled()
and is_visible()
in that conditional would fix the issue as well.
Alright, I added the check for the audio challenge button being visible to that initial loop.
It stopped giving me the easy JS challenge today which has revealed another corner-case issue. As mentioned in the OP I pull cookies from my actual Firefox install in order to be signed into Google and get less rate-limiting. Normally the behavior of recaptcha when you're not getting the JS challenge is it pops up the the image challenge after clicking the checkbox, where you can then click the audio challenge button and solve that. But it seems that (at least when you're logged in to Google) after solving some audio challenges it defaults to it. Meaning that once you click the checkbox you immediately get the audio challenge page instead of first seeing the image challenge page:
This causes a timeout when checking if the audio challenge button is visible/enabled in that loop because that button isn't there in that situation. If it helps, I'm using browser-cookie3==0.17.0
to grab/set the cookies from my main Firefox profile. Example code:
from typing import List, Dict, Any
import browser_cookie3
from playwright.sync_api import sync_playwright
from playwright_recaptcha import recaptchav2
def get_firefox_cookies() -> List[Dict[str, Any]]:
cookie_jar = browser_cookie3.firefox()
cookies = [
{
'name': cookie.name,
'value': cookie.value,
'domain': cookie.domain,
'path': cookie.path,
'secure': bool(cookie.secure),
}
for cookie in cookie_jar
]
return cookies
def solve_recaptcha_v2(url: str) -> None:
with sync_playwright() as p:
browser = p.firefox.launch(headless=False)
context = browser.new_context(
locale='en-US,en;q=0.5',
ignore_https_errors=True,
)
context.clear_cookies()
context.add_cookies(get_firefox_cookies())
page = context.new_page()
page.set_default_navigation_timeout(60_000)
page.set_default_timeout(10_000)
try:
page.goto(url, wait_until='networkidle')
with recaptchav2.SyncSolver(page) as solver:
print(solver.solve_recaptcha())
except Exception:
raise
finally:
page.close()
context.close()
browser.close()
That is odd. I added the check for the audio challenge being visible immediately after clicking the reCAPTCHA checkbox, so this issue should be solved.
Thanks, I'll test it.
So far no issues with over 65 captchas solved. Haven't gotten rate-limited again yet though.
Any issues yet?
Nope. Well over a 100 captchas solved now with no issues. But again, no rate-limiting either. Might be best to close the issue for now.
I'm using the
SyncSolver
in headless mode (Firefox). When the audio challenge is disabled due toYour computer or network may be sending automated queries. To protect our users, we can't process your request right now.
it gets stuck in an infinite loop here because the button never becomes enabled. I'm using v0.0.7.