Cloudflare detecting pupeteer #841

Open joeledwardson opened 9 months ago

joeledwardson commented 9 months ago

I have not queried or clicked anything using puppeteer, simply connected to the browser seems enough for cloudflare to block access to a site.

I have used the simplest possible example in puppeteer with a real browser (no headless) and no automation scripts.

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'

;(async () => {
  const browser = await puppeteer.launch({
    executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe',
    headless: false,
    defaultViewport: null
  const page = await browser.newPage()
  await page.goto('')
  console.log('waiting for 1 min...')
  await new Promise((r) => setTimeout(r, 60_000))
  await browser.close()

I have replicated this without puppeteer and clicking on the cloudflare verification button I pass through to the website, which means I suspect that somehow they are able to detect Puppeteer?

The video below shows manual clicking but cloudflare refuses access:

I have also replicated this on android, forwarding the port to chrome dev tools via ADB and connected to the debugging port and experience the same result.

For mobile, I:

import { Browser, connect } from 'puppeteer-core'

let browser: Browser | null = null

const timer = (ms: number) => new Promise<null>((res) => setTimeout(() => res(null), ms))

export async function puppeteerConnect({
}: {
  port: string
  queryTimeoutMs: number
}): Promise<Browser> {
  const debuggerUrl = '' + port + '/json/version'

  const fetcher = async () => {
    const result = await fetch(debuggerUrl)
    return await result.text()

  const result = await Promise.race([timer(queryTimeoutMs), fetcher()])
  if (result === null) {
    throw new Error('get debugger URL timed out')

  const data = JSON.parse(result) as { webSocketDebuggerUrl?: unknown }

  const wsUrl = data?.webSocketDebuggerUrl
  if (typeof wsUrl !== 'string') {
    throw new Error('get debugger url from response failed, `wsUrl` is not string')

  // use socket url to connect to with puppeteer
  const browser = await Promise.race([
      browserWSEndpoint: wsUrl,
      defaultViewport: null
  if (browser === null) {
    throw new Error('puppeteer connect timed out')
  return browser

async function retryConnect() {
  let lastErr: unknown = null
  let i = 0
  while (i < 20) {
    console.log('connection attempt #', i)
    try {
      return await puppeteerConnect({ port: '9000', queryTimeoutMs: 500 })
    } catch (err) {
      lastErr = err
    await new Promise((r) => setTimeout(r, 1000))
    i += 1
  throw lastErr

;(async () => {
  const _browser = await retryConnect()
  browser = _browser
  const pages = await browser.pages()
  const firstPage = pages[0]
  if (!firstPage) {
    throw new Error('NO PAGE')
  await firstPage.goto('')

  await new Promise((r) => setTimeout(r, 60_000))
})().finally(() => {
  console.log('browser disconnecting')
  console.log('should be done?')
NodePuppeteer commented 9 months ago

Try using the start-up tab and see if it works. We have more info on this problem here:

krkeegan commented 6 months ago

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

zfcsoftware commented 6 months ago

bajgit98 commented 3 weeks ago

I am now recently (within last two weeks) seeing the exact same thing. Using the start-up tab doesn't seem to make a difference.

I had luck up until now. Now, anything that is protected by Cloudflare, simply doesn't let me do anything... even if I solve captcha myself... it continues spinning, or reporting that I've failed to pass the test as human being.

Is there anyone that had luck resolving this issue?

zfcsoftware commented 3 weeks ago

vladtreny commented 3 weeks ago

Friend, your article is absolutely wrong... You completely do not understand the cause of this issue. Please stop spamming these threads.

zfcsoftware commented 3 weeks ago

The article is about passing Cloudflare. 2 pieces of code are given. Both can easily pass including the corporate plan. Which part is wrong? I am trying to convey a source because they constantly say that we cannot pass Cloudflare. Explain the wrong part and let's learn together. Also, I'm not spamming. My first message was to link a github discussion. It has nothing to do with me and there are dozens of people in that discussion. I am waiting for you to explain what is wrong.

Kosmoon commented 3 weeks ago

i had this issue, some website have more advanced scraper detection. The solution was to use a proxy residential service like brightdata, and pass the proxy args to pupeteer.

const BROWSER_CONFIG: PuppeteerLaunchOptions = {
  headless: 'new',
  defaultViewport: null,
  ignoreHTTPSErrors: true,
  args: ['--proxy-server=xxxx:xxxx'],

const browser = await puppeteer.launch(BROWSER_CONFIG);
const page = (await browser.pages())[0];

await page.authenticate({
  username: 'xxxxx',
  password: 'xxxxxx',