daijro / camoufox

🦊 Anti-detect browser
https://camoufox.com
Mozilla Public License 2.0
175 stars 18 forks source link

Detected by datadome in headless mode #26

Closed aniketpradhann closed 1 month ago

aniketpradhann commented 1 month ago

Its getting detected by datadome captcha in headless mode.

datadome

async def main():
    async with AsyncCamoufox(headless=True,
                             humanize=True,
                             block_images=False,
                             #proxy={},
                             block_webrtc=True) as browser:
        context = await browser.new_context()
        page = await context.new_page()
        user_agent = await page.evaluate("navigator.userAgent") 
        async with page.expect_response('https://geo.captcha-delivery.com/interstitial/') as response_info:
            await page.goto('https://geo.captcha-delivery.com/interstitial/?initialCid=AHrlqAAAAAMAAHiYXWB1H3wAZ19SyA==&hash=14D062F60A4BDE8CE8647DFC720349&cid=zRY6jGoAt4X66kpIInPBRqvNGPci6M4vSc70JnH2aIRRfeKisDRTvlG7DLnVau1fA9yYn6YE0mk1YY0KIXMalv9jcJGSyDiAO57tuCoW9SZEycETrL3JfQGtdcRBJRx1&referer=https://datadome.co/&s=44330&b=1157165&dm=cd', wait_until="commit")
            response = await response_info.value
            response_body = await response.json()
            print(response_body)
        await context.close()
        await browser.close()
        print("Closed Browser")
asyncio.run(main())

its works fine in headful by returning the 'cookie': 'datadome=zRY6jGo...;', 'view': 'redirect', but in headless it always returns 'view': 'captcha',.

daijro commented 1 month ago

Hello, thank you for reporting this.

Do you have a testing site? I'm having trouble reproducing this locally with other high security Datadome testing sites. I'd love to look into this.

aniketpradhann commented 1 month ago

im using https://datadome.co/ for testing. they issue is only reproducible in headless mode. https://geo.captcha-delivery.com/interstitial/ the response from this url decides whether to throw a captcha or not and in headless mode it always throws a captcha but works flawlessly in headful mode.

async def main():
    async with AsyncCamoufox(headless=True,
                             humanize=True,
                             block_images=False,
                             screen=Screen(max_width=1920, max_height=1080),
                             block_webrtc=True,) as browser:
        context = await browser.new_context()
        page = await context.new_page()
        async with page.expect_response('https://geo.captcha-delivery.com/interstitial/') as response_info:
            await page.goto('https://datadome.co/', wait_until="commit")
            response = await response_info.value
            response_body = await response.json()
            print(response_body)
            await asyncio.sleep(10)
            await page.screenshot(path='datadome.png', full_page=True)
        await context.close()
        await browser.close()
asyncio.run(main())

i'm able to reproduce the issue with the code above. look for the 'view': '...' in response body as it will always return captcha in headless mode

aniketpradhann commented 1 month ago

For datadome temporary solution is to use Xvfb virtual display.

daijro commented 1 month ago

I've added a minimal implementation of Xvfb (similar to PyVirtualDisplay) into the latest update that temporarily works around the issue.

daijro commented 1 month ago

I've been able to reproduce the issue. However, Datadome has not been reliably flagging it anymore, and often passes while I'm attempting to debug. I could be wrong, but I believe the issue has something to do with the behavior of Firefox's viewport when using headless mode. I'll be keeping a watch on this/looking out for similar headless detection on other sites.

aniketpradhann commented 1 month ago

https://bounty-nodejs.datashield.co/ use this for testing

daijro commented 1 month ago

https://bounty-nodejs.datashield.co/ use this for testing

Thanks.

From my local testing, the leak does not appear to be from the detection of an automation library. The leak is in Firefox's headless feature itself. Datadome continues to flag when launching unmodified Firefox in headless: firefox --headless https://bounty-nodejs.datashield.co/

I've also discovered that enabling the privacy.resistFingerprinting user preference bypasses the detection of headless Firefox, which narrows down the list of possible leaks to this: https://wiki.mozilla.org/Security/Fingerprinting. I'll be running trial/error with these to figure out the leak.

daijro commented 1 month ago

Leak has been fixed in beta.11.

The issue was that the pointer value on headless mode is set to none by default in Firefox:

> window.matchMedia('(pointer: fine)').matches; This returns true on headful browsers and false on headless. > window.matchMedia('(pointer: none)').matches; This returns false on headful browsers and true on headless.