Xewdy444 / Playwright-reCAPTCHA

A Python library for solving reCAPTCHA v2 and v3 with Playwright
https://pypi.org/project/playwright-recaptcha/
MIT License
275 stars 38 forks source link

Recaptcha Enterprise #66

Closed Lyfhael closed 11 months ago

Lyfhael commented 11 months ago

Hello,

Thank you for your help last time on https://github.com/Xewdy444/Playwright-reCAPTCHA/issues/65 I just saw it.

I have an issue with Recaptcha Enterprise where the captcha fails to get resolved but I don't know if I'm doing it wrong or if it's just how it is ?

I read https://github.com/Xewdy444/Playwright-reCAPTCHA/issues/61 and tried to do the same but no luck. (Note that "https://www.myfitnesspal.com/user/search" redirects to "https://www.myfitnesspal.com/account/login", I use it so when I fill credentials it redirects me directly to the page I want.

The cookie loading is just I don't have the modal asking me to accept the cookies because for some reasons I had issues clicking the yes button.

Here is screenshot of failed captcha : fX6B4dO

Also note that at first it used to work(without stealth_sync that you see I added that later to see if it would make a difference), but I don't know if it worked because of I did the solving with your library properly or because recaptcha just decided it would make me pass no matter what

Here is script (it fails at page.wait_for_selector('[href="/profile/myusernamehere"]', timeout=7000) ):

def get_html_results(email):
    # Define the URL
    display = Display(visible=0, size=(800, 600))
    display.start()
    url = "https://www.myfitnesspal.com/user/search"
    with sync_playwright() as p:
        browser = p.firefox.launch()

        context = browser.new_context(locale='en-US')
        with open("myfitnesspal_cookies.json", "r") as f:
            cookies = json.loads(f.read())
            context.add_cookies(cookies)

        page = context.new_page()
        stealth_sync(page)
        page.route(
            "https://www.myfitnesspal.com/account/login",
            lambda route: route.abort()
            if route.request.method == "POST"
            else route.continue_(),
        )
        page.goto(url)

        page.wait_for_selector("input[id='email']", timeout=5000).fill("email@email.com")
        page.fill('input[id="password"]', 'mypassword')
        page.select_option('select[aria-label="language-selector"]', "en")
        with recaptchav3.SyncSolver(page) as solver:
            page.click('.MuiButtonBase-root.MuiButton-root.MuiButton-contained.MuiButton-containedPrimary.MuiButton-sizeMedium')
            token = solver.solve_recaptcha()
        try:
            page.wait_for_selector('[href="/profile/myusernamehere"]', timeout=7000)
        except Exception as e:
            page.screenshot(path='screenshot0.png')
            display.stop()
            raise e
        # Wait for the iframe to appear
        # Wait for the email field to be visible and interact with it

        page.screenshot(path='screenshot0.png')
        page.wait_for_selector("[id='username_or_email']", timeout=5000).fill(email)
        page.screenshot(path='screenshot1.png')
        page.wait_for_selector("input[type='submit'][name='commit'][data-disable-with]", timeout=5000)
        page.evaluate("document.querySelector('form[method=\"post\"]').submit()")
        time.sleep(2)
        page.screenshot(path='screenshot1.png')
        html = page.content()
        browser.close()
        display.stop()
        return html

Thank you !

Xewdy444 commented 11 months ago

It gave me the reCAPTCHA failure message even when using my browser (LibreWolf). Then, it gave a different error that said "We're experiencing technical difficulties. Please try again later." This could be a result of them blocking sign in attempts that have a low reCAPTCHA v3 score, but even my first login attempt failed. They may be enforcing rules that are too strict or are not implementing reCAPTCHA v3 properly. With this in mind, I don't think the issue lies with the Python library, but rather with the website itself.

Xewdy444 commented 11 months ago

Another thing to note is that the reCAPTCHA v3 solver is really only needed if you are making sign in requests with raw requests using something like the requests library. The reason for this is that the browser will solve reCAPTCHA v3 on its own, so no manual interaction is required. You should be able to log in using Playwright as you would if there was no reCAPTCHA v3 challenge present.

Lyfhael commented 11 months ago

I see, mmh, I'll check if Selenium manages to get rank higher score, and if not then I guess I'll have to store session cookies periodically for later use instead of logging in everytime

It just randomly worked. But will randomly fail later I suppose

Xewdy444 commented 11 months ago

You could try using botright: https://github.com/Vinyzu/Botright. It should work fine with this library, although it can only be used asynchronously.

Lyfhael commented 11 months ago

You could try using botright: https://github.com/Vinyzu/Botright. It should work fine with this library, although it can only be used asynchronously.

I went for the cookie option after all, but it got me curious so I checked and used botright to scrape a website protected by hCaptcha. Without any direspect intended, it works 10% of the time and is much slower :/ (mainly due to the AI giving wrong answers to the captcha)

But it's interesting I'll keep it in mind if I ever get issues with recaptcha in the future. For the website protected by hCaptcha I'll try and use capsolver

Xewdy444 commented 11 months ago

Yea, I just saw that it had some stealth capabilities so I thought it might be good for your use case. hCaptcha has been pretty difficult to solve with AI recently with their new challenge types that require you to click on certain objects in an image.