berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.23k stars 732 forks source link

Cloudflare detecting pupeteer #841

Open joeledwardson opened 9 months ago

joeledwardson commented 9 months ago

I have not queried or clicked anything using puppeteer, simply connected to the browser seems enough for cloudflare to block access to a site.

I have used the simplest possible example in puppeteer with a real browser (no headless) and no automation scripts.

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
puppeteer.use(StealthPlugin())

;(async () => {
  console.log('launching...')
  const browser = await puppeteer.launch({
    executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe',
    headless: false,
    defaultViewport: null
  })
  console.log('connected')
  const page = await browser.newPage()
  await page.goto('https://nowsecure.nl')
  console.log('waiting for 1 min...')
  await new Promise((r) => setTimeout(r, 60_000))
  console.log('closing...')
  await browser.close()
})()

I have replicated this without puppeteer and clicking on the cloudflare verification button I pass through to the website, which means I suspect that somehow they are able to detect Puppeteer?

The video below shows manual clicking but cloudflare refuses access:

https://github.com/berstend/puppeteer-extra/assets/25906558/973501b3-25e5-40a4-98ad-888315930b4b

I have also replicated this on android, forwarding the port to chrome dev tools via ADB and connected to the debugging port and experience the same result.

For mobile, I:

import { Browser, connect } from 'puppeteer-core'

let browser: Browser | null = null

const timer = (ms: number) => new Promise<null>((res) => setTimeout(() => res(null), ms))

export async function puppeteerConnect({
  port,
  queryTimeoutMs
}: {
  port: string
  queryTimeoutMs: number
}): Promise<Browser> {
  const debuggerUrl = 'http://127.0.0.1:' + port + '/json/version'

  const fetcher = async () => {
    const result = await fetch(debuggerUrl)
    return await result.text()
  }

  const result = await Promise.race([timer(queryTimeoutMs), fetcher()])
  if (result === null) {
    throw new Error('get debugger URL timed out')
  }

  const data = JSON.parse(result) as { webSocketDebuggerUrl?: unknown }

  const wsUrl = data?.webSocketDebuggerUrl
  if (typeof wsUrl !== 'string') {
    throw new Error('get debugger url from response failed, `wsUrl` is not string')
  }

  // use socket url to connect to with puppeteer
  const browser = await Promise.race([
    connect({
      browserWSEndpoint: wsUrl,
      defaultViewport: null
    }),
    timer(queryTimeoutMs)
  ])
  if (browser === null) {
    throw new Error('puppeteer connect timed out')
  }
  return browser
}

async function retryConnect() {
  let lastErr: unknown = null
  let i = 0
  while (i < 20) {
    console.log('connection attempt #', i)
    try {
      return await puppeteerConnect({ port: '9000', queryTimeoutMs: 500 })
    } catch (err) {
      lastErr = err
    }
    await new Promise((r) => setTimeout(r, 1000))
    i += 1
  }
  throw lastErr
}

;(async () => {
  console.log('connecting...')
  const _browser = await retryConnect()
  console.log('connected!')
  browser = _browser
  const pages = await browser.pages()
  const firstPage = pages[0]
  if (!firstPage) {
    throw new Error('NO PAGE')
  }
  await firstPage.goto('https://nowsecure.nl')

  await new Promise((r) => setTimeout(r, 60_000))
})().finally(() => {
  console.log('browser disconnecting')
  browser?.disconnect()
  console.log('should be done?')
})
NodePuppeteer commented 9 months ago

Try using the start-up tab and see if it works. We have more info on this problem here: https://github.com/berstend/puppeteer-extra/issues/832

krkeegan commented 6 months ago

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

zfcsoftware commented 6 months ago

@krkeegan @joeledwardson @NodePuppeteer @peterblazejewicz @bclougherty

832

bajgit98 commented 3 weeks ago

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

zfcsoftware commented 3 weeks ago

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

https://medium.com/@zfcsoftware/how-to-bypass-cloudflare-with-node-js-869fa6e21dd5

vladtreny commented 3 weeks ago

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

zfcsoftware commented 3 weeks ago

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

The article is about passing Cloudflare. 2 pieces of code are given. Both can easily pass including the corporate plan. Which part is wrong? I am trying to convey a source because they constantly say that we cannot pass Cloudflare. Explain the wrong part and let's learn together. Also, I'm not spamming. My first message was to link a github discussion. It has nothing to do with me and there are dozens of people in that discussion. I am waiting for you to explain what is wrong.

Kosmoon commented 3 weeks ago

i had this issue, some website have more advanced scraper detection. The solution was to use a proxy residential service like brightdata, and pass the proxy args to pupeteer.

const BROWSER_CONFIG: PuppeteerLaunchOptions = {
  headless: 'new',
  defaultViewport: null,
  ignoreHTTPSErrors: true,
  args: ['--proxy-server=xxxx:xxxx'],
};

const browser = await puppeteer.launch(BROWSER_CONFIG);
const page = (await browser.pages())[0];

await page.authenticate({
  username: 'xxxxx',
  password: 'xxxxxx',
});