Open brauliobo opened 2 months ago
It is not possible to fix it properly without modifying the Chromium source.
If anyone could start a project for a custom Chromium, would be great
It is possible to replace puppeteer with a websocket connection through a browser extension (loaded with --load-extension
) and then control the browser with chrome.scripting.executeScript
calls. I've tested and verified in a PoC.
I wonder though if it is enough to use Webdriver instead of CDP to communicate with Puppeteer
Interesting idea. Before puppeteer, I used extensions. Puppeteer is cool, but extensions provide additional useful apis.
You can also call cdp commands on the background of extension. It is not detectable.
I made a patch for this issue, it disables Runtime.enable
that causes this leak.
You can check it out here: https://github.com/rebrowser/rebrowser-patches
^ This way we lose a lot of funcs of puppeteer. We need a chromium patch... if anyone could start a project
^ This way we lose a lot of funcs of puppeteer. We need a chromium patch... if anyone could start a project
@vladtreny I see no loss in functions of puppeteer after the patch. If you could find any, please let me know via issues section, I will be happy to address it.
console.log does not work click does not work other cdp functions do not work
Runtime.consoleAPICalled
eventsDo you get even here?
page.on('console', async message => {
Also, show how do you click?
page.on('console')
relies on Runtime.consoleAPICalled
, it won't work, that's true.
page.click(selector, clickOptions)
- this one works fine with my patch
how does it detect element to click? can i select it correctly inside random 10 iframes?
does it click inside closed shadow root?
@vladtreny it feels like I have to defend myself for something... I'm not trying to sell you anything.
You can try my solution, find any non-working stuff in patched version, and open a new issue for this. I will be glad to assist. Thanks.
It seems less than optimal that one has to patch a browser to prevent this non-standard stack
field from leaking side-effects.
Has anyone else here also tried just not using Chrome? A preliminary test from our end showed that the only leaked behavior from Firefox was window.navigator.webdriver
, and I'm not sure what the state of the art is to patch that, but IIRC, it's not as simple as setting it to false.
I should add that most detection strategies explicitly target Google Chrome, but there's far fewer explicit detection strategies for Firefox it seems. It's also Tor Browser's browser of choice for a reason.
@andrewmcwatters since this post yesterday - https://hacks.mozilla.org/2024/08/puppeteer-support-for-firefox/ - I guess detection strategies will evolve quite quickly to target Firefox, too.
Yeah, I'm wanting to diversify away from just automating with Google Chrome and stealth measures, since it's a bit of a risk factor at this point.
I'm not sure what WebDriver BiDi's equivalent of Page.addScriptToEvaluateOnNewDocument
is, though, or if it has one. It might be necessary to build an equivalent event from existing standard ones.
andrewmcwatters@Andrews-iMac redacted % node --test
â–¶ tests
✔ https://arh.antoinevastel.com/bots/ (5887.661417ms)
✔ https://arh.antoinevastel.com/bots/areyouheadless (2896.569986ms)
✔ BotD (1423.917009ms)
✖ Fingerprint Pro Bot Detection (3997.748462ms)
AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected
+ 'You are a bot'
- 'You are not a bot'
^
at TestContext.<anonymous> (redacted)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Test.run (node:internal/test_runner/test:857:9)
at async Suite.processPendingSubtests (node:internal/test_runner/test:565:7) {
generatedMessage: true,
code: 'ERR_ASSERTION',
actual: 'You are a bot',
expected: 'You are not a bot',
operator: 'strictEqual'
}
✖ BrowserScan (1719.802858ms)
AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected
+ 'Robot'
- 'Normal'
at TestContext.<anonymous> (redacted)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Test.run (node:internal/test_runner/test:857:9)
at async Suite.processPendingSubtests (node:internal/test_runner/test:565:7) {
generatedMessage: true,
code: 'ERR_ASSERTION',
actual: 'Robot',
expected: 'Normal',
operator: 'strictEqual'
}
â–¶ tests (15927.932604ms)
ℹ tests 5
ℹ suites 1
ℹ pass 3
ℹ fail 2
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 16215.926107
Need to patch chromium. Nothing to do. All these hacks are detectable.
Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.
Seems to me that this or some other leak is affecting recaptcha bypass since yesterday. Getting challenge 90%+ of the time on v2 and low score on v3. Can anyone confirm?
Need to patch chromium. Nothing to do. All these hacks are detectable.
Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.
Bummer.
Need to patch chromium. Nothing to do.
All these hacks are detectable.
Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.
Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.
Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.
I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root.
Cloudflare reads these threads, but anyway :)
Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.
I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root.
Cloudflare reads these threads, but anyway :)
You keep saying about losing some features, but you never provide any specific code that stop working when Runtime is off. If you could, that would be really useful for the community.
For example, universally find an element in runtime. Click inside shadowroot closed.
Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.
I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root. Cloudflare reads these threads, but anyway :)
You keep saying about losing some features, but you never provide any specific code that stop working when Runtime is off. If you could, that would be really useful for the community.
It's also not desirable to use Selenium, but use a non-Selenium API.
@vladtreny do you have any example of code that breaks after disabling Runtime.enable command? @andrewmcwatters could you please clarify?
Many features break after not enabling Runtime. But as i said im currently working on a playwright solution, which fixes !every! issue.
@vladtreny do you have any example of code that breaks after disabling Runtime.enable command? @andrewmcwatters could you please clarify?
I'm talking specifically about Selenium-Driverless, not so much about your patches disabling the Runtime.enable
command. The README.md of https://github.com/kaliiiiiiiiii/Selenium-Driverless reads:
Note: This project is moving away from the selenium syntax
I'm not going to try and speak for everyone, but I think a lot of us are looking for drop-in solutions like yours.
My business uses Selenium, though, we don't use the other automation frameworks in part because they're explicitly not designed for anything other than testing. You can use them for other purposes, but it's leads to hacking around their APIs.
Puppeeteer stealth is now being easily detected, checkout https://deviceandbrowserinfo.com/learning_zone/articles/detecting-headless-chrome-puppeteer-2024