berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.52k stars 744 forks source link

[Bug] cloudflare not working anymore? #811

Open phpmoli opened 1 year ago

phpmoli commented 1 year ago

Hello,

I used puppeteer-extra-plugin-stealth in a few scripts before to get past cloudflare (e.g. patreon), with success, so i know it and have experience with it working in the past. Today, i wanted to scrape a new site, so i dusted off an old (then working, used) script, edited it, tested it, and now i can not get past the cloudflare ddos.

To help debugging the problem, i have cut the problematic script to the smallest possible size:

require( "puppeteer-extra" )
.use( require( "puppeteer-extra-plugin-stealth" )())
.launch( { headless: "new", })
.then( browser =>
  browser.newPage()
  .then( page =>
    page.goto( "https://dodi-repacks.site/feed/", { waitUntil: "load", })
    .then( () => page.waitForTimeout( 20000 ))
    .then( () => page.content())
    .then( source => { console.log( source ); })));

It displays the cloudflare html. Also tested after got a new ip. If the above website url is opened in a normal chrome, it displays the correct rss feed xml file.

Is it me, or is it the cat-mouse race? Thanks!

  System:
    OS: Linux 5.10 Debian GNU/Linux 11 (bullseye) 11 (bullseye)
    CPU: (4) x64 Intel(R) Celeron(R) CPU J3455 @ 1.50GHz
    Memory: 4.28 GB / 5.46 GB
    Container: Yes
    Shell: 5.1.4 - /bin/bash
  Binaries:
    Node: 19.9.0 - /usr/bin/node
    Yarn: 1.22.19 - /usr/bin/yarn
    npm: 9.6.3 - /usr/bin/npm
  npmPackages:
    puppeteer: * => 20.5.0
    puppeteer-extra: * => 3.3.6
    puppeteer-extra-plugin-stealth: * => 2.11.2
themattrobinson commented 1 year ago

Seeing the same thing, however for me, Cloudflare recaptcha is not recognized by the puppeteer-extra-plugin-recaptcha, so it's not automatically solved. The cloudflare recaptcha is shown much more often now (past few weeks/month or so), and previously it wasn't an issue.

mrSlack commented 1 year ago

I also just get Cloudflare html. Both inheadless(new) and headful mode. If the site opens in regular chrome, it displays correctly.

`node: v20.2.0
npm: 9.6.6

npmPackages: puppeteer: => 20.5.0 puppeteer-extra: => 3.3.6 puppeteer-extra-plugin-stealth: * => 2.11.2 `

julian-george commented 1 year ago

@mrSlack same issue here. Running in headful fails at the Cloudflare "checking your browser" when the exact same behavior in non-puppeteer chrome has no issue.