daijro / camoufox

🦊 Undetected web scraping browser
Mozilla Public License 2.0
53 stars 5 forks source link

Unable to access iframe #3

Closed desoforgit closed 4 days ago

desoforgit commented 3 weeks ago

Reproducible code

from playwright.async_api import Playwright, async_playwright, expect

CONFIG = {
'window.outerHeight': 1056,
'window.outerWidth': 1920,
'window.innerHeight': 1008,
'window.innerWidth': 1920,
'window.history.length': 4,
'navigator.userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0',
'navigator.appCodeName': 'Mozilla',
'navigator.appName': 'Netscape',
'navigator.appVersion': '5.0 (Windows)',
'navigator.oscpu': 'Windows NT 10.0; Win64; x64',
'navigator.language': 'en-US',
'navigator.languages': ['en-US'],
'navigator.platform': 'Win32',
'navigator.hardwareConcurrency': 12,
'navigator.product': 'Gecko',
'navigator.productSub': '20030107',
'navigator.maxTouchPoints': 10,
}

async def run(playwright: Playwright) -> None:
    browser = await playwright.firefox.launch(headless=False,
        executable_path='path/to/camoufox/launch',
        args=['--config', json.dumps(CONFIG)],
        firefox_user_prefs={'media.peerconnection.enabled': False,})
    context = await browser.new_context()
    page = await context.new_page()
    await page.goto("https://www.w3schools.com/html/html_iframe.asp", wait_until="domcontentloaded")
    await page.frame_locator("iframe[title=\"W3Schools HTML Tutorial\"]").get_by_role("heading", name="HTML Tutorial").click()

    # ---------------------
    await context.close()
    await browser.close()
async def main() -> None:
    async with async_playwright() as playwright:
        await run(playwright)

asyncio.run(main())

The code works with normal firefox/chromium but with camoufox it throws and error

Call log: waiting for frame_locator("iframe[title=\"W3Schools HTML Tutorial\"]").get_by_role("heading", name="HTML Tutorial")

daijro commented 3 weeks ago

Root issue

In the original version of Juggler, an internal tree of all frames is kept as they are created/destroyed on the page. Playwright's FrameLocator uses this frame tree to find iframes when frame_locator is called.

However, when Juggler collects frames for its internal FrameTree frame, it creates an execution context within each frame to send input events. Some WAFs are able to detect this. I had no choice but to strip it from Juggler, which is why .click() fails.

Solution

For now, you should be able to evaluate JavaScript on the page to access iframes as you would in a normal browser.

I will be working on a fix to restore FrameLocator functionality through an alternative approach to creating the frame tree.

daijro commented 4 days ago

Fixed in v130.0-beta.5