berstend / puppeteer-extra

💯 Teach puppeteer new tricks through plugins.
https://extra.community
MIT License
6.24k stars 732 forks source link

[Bug] Recaptcha Plugin doesn't work for Hcaptcha / Cloudflare #404

Closed seuaCoder closed 3 years ago

seuaCoder commented 3 years ago

Describe the bug I'm trying to solve Hcaptcha on Cloudflare 403 page. Navigate to a Cloudflare protected websites (example https://pixelscan.net) while using a proxy. Cloudflare will send you to a 403 page and ask you to solve Hcaptcha. page.solveRecaptchas() doesn't do anything.

Code Snippet

const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

StealthPlugin.onBrowser = () => {};
puppeteer.use(StealthPlugin())

puppeteer.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: '', // REPLACE THIS WITH YOUR OWN 2CAPTCHA API KEY ⚡
    },
    visualFeedback: true, // colorize reCAPTCHAs (violet = detected, green = solved)
  })
)

const args = [
    '--proxy-server=socks5://127.0.0.1:9050', // i'm using tor as proxy here
  ];

puppeteer.launch({ headless: false, args }).then(async browser => {

const page = await browser.newPage()
    await page.solveRecaptchas()
    await page.goto('https://pixelscan.net ')
    await page.waitForNavigation()
    await page.solveRecaptchas()
    //  await browser.close()
})

Versions

System: OS: macOS 11.1 CPU: (8) x64 Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz Memory: 448.02 MB / 16.00 GB Shell: 5.8 - /bin/zsh Binaries: Node: 15.6.0 - /usr/local/bin/node Yarn: 1.22.4 - /usr/local/bin/yarn npm: 7.4.0 - /usr/local/bin/npm npmPackages: puppeteer: ^5.5.0 => 5.5.0 puppeteer-extra: ^3.1.16 => 3.1.16 puppeteer-extra-plugin-recaptcha: ^3.3.1 => 3.3.1 puppeteer-extra-plugin-stealth: ^2.6.6 => 2.6.6

berstend commented 3 years ago

Can't test right now with a proxy, could you try again and print the debug logs here (mentioned in the readme)?

Also interesting: A screenshot of the page (to confirm a captcha is actually being shown) and ideally the HTML or devtools screenshot showing where in the DOM the hcaptcha frame is found (in case it's in a nested iframe, etc). :-)

seuaCoder commented 3 years ago

Debug log

puppeteer-extra-plugin:base:stealth Initialized. +0ms
  puppeteer-extra plugin registered stealth +0ms
  puppeteer-extra-plugin:base:recaptcha Initialized. +0ms
  puppeteer-extra-plugin:recaptcha Initialized {
  visualFeedback: true,
  throwOnError: false,
  provider: { id: '2captcha', token: 'REMOVED' }
} +0ms
  puppeteer-extra plugin registered recaptcha +2ms
  puppeteer-extra dependencies missing Set(15) {
  'stealth/evasions/chrome.app',
  'stealth/evasions/chrome.csi',
  'stealth/evasions/chrome.loadTimes',
  'stealth/evasions/chrome.runtime',
  'stealth/evasions/iframe.contentWindow',
  'stealth/evasions/media.codecs',
  'stealth/evasions/navigator.hardwareConcurrency',
  'stealth/evasions/navigator.languages',
  'stealth/evasions/navigator.permissions',
  'stealth/evasions/navigator.plugins',
  'stealth/evasions/navigator.webdriver',
  'stealth/evasions/sourceurl',
  'stealth/evasions/user-agent-override',
  'stealth/evasions/webgl.vendor',
  'stealth/evasions/window.outerdimensions'
} +1ms
  puppeteer-extra-plugin:base:stealth/evasions/chrome.app Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/chrome.app +26ms
  puppeteer-extra-plugin:base:stealth/evasions/chrome.csi Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/chrome.csi +2ms
  puppeteer-extra-plugin:base:stealth/evasions/chrome.loadTimes Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/chrome.loadTimes +1ms
  puppeteer-extra-plugin:base:stealth/evasions/chrome.runtime Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/chrome.runtime +2ms
  puppeteer-extra-plugin:base:stealth/evasions/iframe.contentWindow Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/iframe.contentWindow +2ms
  puppeteer-extra-plugin:base:stealth/evasions/media.codecs Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/media.codecs +1ms
  puppeteer-extra-plugin:base:stealth/evasions/navigator.hardwareConcurrency Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/navigator.hardwareConcurrency +2ms
  puppeteer-extra-plugin:base:stealth/evasions/navigator.languages Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/navigator.languages +1ms
  puppeteer-extra-plugin:base:stealth/evasions/navigator.permissions Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/navigator.permissions +2ms
  puppeteer-extra-plugin:base:stealth/evasions/navigator.plugins Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/navigator.plugins +6ms
  puppeteer-extra-plugin:base:stealth/evasions/navigator.webdriver Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/navigator.webdriver +2ms
  puppeteer-extra-plugin:base:stealth/evasions/sourceurl Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/sourceurl +2ms
  puppeteer-extra-plugin:base:stealth/evasions/user-agent-override Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/user-agent-override +1ms
  puppeteer-extra-plugin:base:stealth/evasions/webgl.vendor Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/webgl.vendor +2ms
  puppeteer-extra-plugin:base:stealth/evasions/window.outerdimensions Initialized. +0ms
  puppeteer-extra plugin registered stealth/evasions/window.outerdimensions +105ms
  puppeteer-extra orderPlugins:before [
  'stealth',
  'recaptcha',
  'stealth/evasions/chrome.app',
  'stealth/evasions/chrome.csi',
  'stealth/evasions/chrome.loadTimes',
  'stealth/evasions/chrome.runtime',
  'stealth/evasions/iframe.contentWindow',
  'stealth/evasions/media.codecs',
  'stealth/evasions/navigator.hardwareConcurrency',
  'stealth/evasions/navigator.languages',
  'stealth/evasions/navigator.permissions',
  'stealth/evasions/navigator.plugins',
  'stealth/evasions/navigator.webdriver',
  'stealth/evasions/sourceurl',
  'stealth/evasions/user-agent-override',
  'stealth/evasions/webgl.vendor',
  'stealth/evasions/window.outerdimensions'
] +0ms
  puppeteer-extra orderPlugins:after [
  'stealth',
  'recaptcha',
  'stealth/evasions/chrome.app',
  'stealth/evasions/chrome.csi',
  'stealth/evasions/chrome.loadTimes',
  'stealth/evasions/chrome.runtime',
  'stealth/evasions/media.codecs',
  'stealth/evasions/navigator.hardwareConcurrency',
  'stealth/evasions/navigator.languages',
  'stealth/evasions/navigator.permissions',
  'stealth/evasions/navigator.plugins',
  'stealth/evasions/navigator.webdriver',
  'stealth/evasions/sourceurl',
  'stealth/evasions/user-agent-override',
  'stealth/evasions/webgl.vendor',
  'stealth/evasions/window.outerdimensions',
  'stealth/evasions/iframe.contentWindow'
] +1ms
  puppeteer-extra-plugin:recaptcha onPageCreated about:blank +0ms
  puppeteer-extra-plugin:stealth/evasions/user-agent-override onPageCreated - Will set these user agent options {
  override: {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4298.0 Safari/537.36',
    acceptLanguage: 'en-US,en',
    platform: 'Win32'
  },
  opts: { userAgent: null, locale: 'en-US,en', platform: 'Win32' }
} +0ms
  puppeteer-extra-plugin:recaptcha solveRecaptchas +0ms
  puppeteer-extra-plugin:recaptcha findRecaptchas +0ms
  puppeteer-extra-plugin:stealth/evasions/sourceurl Stripping sourceURL { method: 'Runtime.evaluate' } +0ms
  puppeteer-extra-plugin:stealth/evasions/sourceurl Stripping sourceURL { method: 'Runtime.callFunctionOn' } +3ms
  puppeteer-extra-plugin:recaptcha hasRecaptchaScriptTag false +0ms
  puppeteer-extra-plugin:stealth/evasions/sourceurl Stripping sourceURL { method: 'Runtime.callFunctionOn' } +6ms
  puppeteer-extra-plugin:recaptcha hasHcaptchaScriptTag false +0ms
  puppeteer-extra-plugin:recaptcha _generateContentScript recaptcha findRecaptchas undefined +0ms
  puppeteer-extra-plugin:stealth/evasions/sourceurl Stripping sourceURL { method: 'Runtime.evaluate' } +1s
  puppeteer-extra-plugin:recaptcha _generateContentScript hcaptcha findRecaptchas undefined +0ms
  puppeteer-extra-plugin:stealth/evasions/sourceurl Stripping sourceURL { method: 'Runtime.evaluate' } +3ms
  puppeteer-extra-plugin:recaptcha findRecaptchas { captchas: [], error: null } +0ms
  puppeteer-extra-plugin:recaptcha solveRecaptchas { captchas: [], solutions: [], solved: [], error: null } +0ms

Screenshot

I took the screenshot myself because i'm running headful mode.

Screen Shot 2021-01-19 at 14 11 53

Devtool screenshot

Screen Shot 2021-01-19 at 14 21 00
berstend commented 3 years ago

@seuaCoder while you have the devtools open like that, could you check the output of the following in the Console tab?

document.querySelector('script[src*="//hcaptcha.com/1/api.js"]') and window.hcaptcha

berstend commented 3 years ago

For reference, here's the (expected) output from http://captcha.website/:

image

Would be interesting to know what happens if you point your scraper to captcha.website as well.

berstend commented 3 years ago

puppeteer-extra-plugin:recaptcha hasHcaptchaScriptTag false +0ms

Is it possible the captcha hasn't loaded yet in your case?

    await page.goto('https://pixelscan.net ')
    await page.waitForNavigation()

This is meant to be used concurrently with Promise.all, alternative use the { waitUntil } property.

berstend commented 3 years ago

We continued the discussion on discord and noticed that the hcaptcha wasn't loaded yet before page.solveRecaptchas() ran, adding a timeout after page.goto fixed the hcaptcha detection issue.

image