berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.38k stars 738 forks source link

[Bug] Stealth being detected by Chrome DevTools Protocol (CDP) #899

Open brauliobo opened 2 months ago

brauliobo commented 2 months ago

Puppeeteer stealth is now being easily detected, checkout https://deviceandbrowserinfo.com/learning_zone/articles/detecting-headless-chrome-puppeteer-2024

vladtreny commented 2 months ago

It is not possible to fix it properly without modifying the Chromium source.

If anyone could start a project for a custom Chromium, would be great

brauliobo commented 2 months ago

It is possible to replace puppeteer with a websocket connection through a browser extension (loaded with --load-extension) and then control the browser with chrome.scripting.executeScript calls. I've tested and verified in a PoC.

I wonder though if it is enough to use Webdriver instead of CDP to communicate with Puppeteer

vladtreny commented 2 months ago

Interesting idea. Before puppeteer, I used extensions. Puppeteer is cool, but extensions provide additional useful apis.

You can also call cdp commands on the background of extension. It is not detectable.

nwebson commented 1 month ago

I made a patch for this issue, it disables Runtime.enable that causes this leak. You can check it out here: https://github.com/rebrowser/rebrowser-patches

vladtreny commented 1 month ago

^ This way we lose a lot of funcs of puppeteer. We need a chromium patch... if anyone could start a project

nwebson commented 1 month ago

^ This way we lose a lot of funcs of puppeteer. We need a chromium patch... if anyone could start a project

@vladtreny I see no loss in functions of puppeteer after the patch. If you could find any, please let me know via issues section, I will be happy to address it.

vladtreny commented 1 month ago

console.log does not work click does not work other cdp functions do not work

nwebson commented 1 month ago
vladtreny commented 1 month ago

Do you get even here? page.on('console', async message => {

Also, show how do you click?

nwebson commented 1 month ago

page.on('console') relies on Runtime.consoleAPICalled, it won't work, that's true. page.click(selector, clickOptions) - this one works fine with my patch

vladtreny commented 1 month ago

how does it detect element to click? can i select it correctly inside random 10 iframes?

does it click inside closed shadow root?

nwebson commented 1 month ago

@vladtreny it feels like I have to defend myself for something... I'm not trying to sell you anything.

You can try my solution, find any non-working stuff in patched version, and open a new issue for this. I will be glad to assist. Thanks.

andrewmcwatters commented 1 month ago

It seems less than optimal that one has to patch a browser to prevent this non-standard stack field from leaking side-effects.

Has anyone else here also tried just not using Chrome? A preliminary test from our end showed that the only leaked behavior from Firefox was window.navigator.webdriver, and I'm not sure what the state of the art is to patch that, but IIRC, it's not as simple as setting it to false.

andrewmcwatters commented 1 month ago

I should add that most detection strategies explicitly target Google Chrome, but there's far fewer explicit detection strategies for Firefox it seems. It's also Tor Browser's browser of choice for a reason.

nwebson commented 1 month ago

@andrewmcwatters since this post yesterday - https://hacks.mozilla.org/2024/08/puppeteer-support-for-firefox/ - I guess detection strategies will evolve quite quickly to target Firefox, too.

andrewmcwatters commented 1 month ago

Yeah, I'm wanting to diversify away from just automating with Google Chrome and stealth measures, since it's a bit of a risk factor at this point.

I'm not sure what WebDriver BiDi's equivalent of Page.addScriptToEvaluateOnNewDocument is, though, or if it has one. It might be necessary to build an equivalent event from existing standard ones.

andrewmcwatters commented 1 month ago
andrewmcwatters@Andrews-iMac redacted % node --test
â–¶ tests
  ✔ https://arh.antoinevastel.com/bots/ (5887.661417ms)
  ✔ https://arh.antoinevastel.com/bots/areyouheadless (2896.569986ms)
  ✔ BotD (1423.917009ms)
  ✖ Fingerprint Pro Bot Detection (3997.748462ms)
    AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
    + actual - expected

    + 'You are a bot'
    - 'You are not a bot'
               ^
        at TestContext.<anonymous> (redacted)
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async Test.run (node:internal/test_runner/test:857:9)
        at async Suite.processPendingSubtests (node:internal/test_runner/test:565:7) {
      generatedMessage: true,
      code: 'ERR_ASSERTION',
      actual: 'You are a bot',
      expected: 'You are not a bot',
      operator: 'strictEqual'
    }

  ✖ BrowserScan (1719.802858ms)
    AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
    + actual - expected

    + 'Robot'
    - 'Normal'
        at TestContext.<anonymous> (redacted)
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async Test.run (node:internal/test_runner/test:857:9)
        at async Suite.processPendingSubtests (node:internal/test_runner/test:565:7) {
      generatedMessage: true,
      code: 'ERR_ASSERTION',
      actual: 'Robot',
      expected: 'Normal',
      operator: 'strictEqual'
    }

â–¶ tests (15927.932604ms)
ℹ tests 5
ℹ suites 1
ℹ pass 3
ℹ fail 2
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 16215.926107
vladtreny commented 1 month ago

Need to patch chromium. Nothing to do. All these hacks are detectable.

Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.

ottodriver commented 1 month ago

Seems to me that this or some other leak is affecting recaptcha bypass since yesterday. Getting challenge 90%+ of the time on v2 and low score on v3. Can anyone confirm?

andrewmcwatters commented 1 month ago

Need to patch chromium. Nothing to do. All these hacks are detectable.

Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.

Bummer.

Vinyzu commented 1 month ago

Need to patch chromium. Nothing to do.

All these hacks are detectable.

Also, the new protection in cloudflare via shadow root, is hard to bypass. Possible, but not ideal.

Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.

vladtreny commented 1 month ago

Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.

I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root.

Cloudflare reads these threads, but anyway :)

nwebson commented 1 month ago

Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.

I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root.

Cloudflare reads these threads, but anyway :)

You keep saying about losing some features, but you never provide any specific code that stop working when Runtime is off. If you could, that would be really useful for the community.

vladtreny commented 1 month ago

For example, universally find an element in runtime. Click inside shadowroot closed.

andrewmcwatters commented 1 month ago

Its possible. See selenium-driverless. Im also currently working on an open-source (playwright based) solution.

I know it... just rechecked if they added something new. What kind of value does it bring? Turns off runtime? This way we lose a lot of features. Including ability to bypass this new cloudflare shadow root. Cloudflare reads these threads, but anyway :)

You keep saying about losing some features, but you never provide any specific code that stop working when Runtime is off. If you could, that would be really useful for the community.

It's also not desirable to use Selenium, but use a non-Selenium API.

nwebson commented 1 month ago

@vladtreny do you have any example of code that breaks after disabling Runtime.enable command? @andrewmcwatters could you please clarify?

Vinyzu commented 1 month ago

Many features break after not enabling Runtime. But as i said im currently working on a playwright solution, which fixes !every! issue.

andrewmcwatters commented 1 month ago

@vladtreny do you have any example of code that breaks after disabling Runtime.enable command? @andrewmcwatters could you please clarify?

I'm talking specifically about Selenium-Driverless, not so much about your patches disabling the Runtime.enable command. The README.md of https://github.com/kaliiiiiiiiii/Selenium-Driverless reads:

Note: This project is moving away from the selenium syntax

I'm not going to try and speak for everyone, but I think a lot of us are looking for drop-in solutions like yours.

My business uses Selenium, though, we don't use the other automation frameworks in part because they're explicitly not designed for anything other than testing. You can use them for other purposes, but it's leads to hacking around their APIs.