apify / browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
87 stars 14 forks source link

Puppeteer v14.4.0 (and higher) broken when using puppeteer-extras #78

Open corford opened 2 years ago

corford commented 2 years ago

browserController.newPage() fails with "Cannot read private member from an object whose class did not declare it" when used with puppeteer-core v14.4.0+ and puppeteer-extra v3.3.0 (with puppeteer-extra-plugin-stealth v2.10.1).

Seems to be a browser-pool issue (possibly Typescript related?), most likely caused by this puppeteer commit: https://github.com/puppeteer/puppeteer/compare/v14.3.0...v14.4.0 (a lot of class fields were made private in that commit).

FYI, I have no issue running puppeteer natively without browser-pool e.g. this works with puppeteer-core 14.4.0 (and higher):

import vanillaPuppeteer from 'puppeteer-core';
import { addExtra } from 'puppeteer-extra';
import Stealth from 'puppeteer-extra-plugin-stealth';
import UserAgentOverride from 'puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js'; // eslint-disable-line max-len
import WebGLVendor from 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor/index.js';
import UserPreferences from 'puppeteer-extra-plugin-user-preferences/index.js';

const stealth = Stealth();
stealth.enabledEvasions.delete('user-agent-override');
stealth.enabledEvasions.delete('webgl.vendor');

const prefs = UserPreferences({
  userPrefs: {
    intl: {
      accept_languages: 'en-US',
    },
    webrtc: {
      ip_handling_policy: 'disable_non_proxied_udp',
      multiple_routes_enabled: false,
      nonproxied_udp_enabled: false,
    },
  },
});

const ua = UserAgentOverride({
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
  locale: 'en-US',
  maskLinux: true,
});

const webgl = WebGLVendor({
  vendor: 'Google Inc. (Intel)',
  renderer: 'ANGLE (Intel, Intel(R) UHD Graphics Direct3D11 vs_5_0 ps_5_0, D3D11-27.20.100.8984)',
});

const puppeteer = addExtra(vanillaPuppeteer);
puppeteer.use(stealth);
puppeteer.use(prefs);
puppeteer.use(ua);
puppeteer.use(webgl);

puppeteer.launch({ executablePath: '/usr/bin/google-chrome' }).then(async browser => {
  const page = await browser.newPage();
  await page.setViewport({ width: 800, height: 600 });

  console.log(`Testing the stealth plugin..`);
  await page.goto('https://bot.sannysoft.com');
  await page.waitForTimeout(5000);
  await page.screenshot({ path: 'stealth.png', fullPage: true });

  console.log(`All done, check the screenshots. ✨`);
  await browser.close();
});
B4nan commented 2 years ago

cc @vladfrangu this is what I was talking about today

B4nan commented 2 years ago

(possibly Typescript related?),

Seeing that commit and native private fields, I doubt this is about TS. We are apparently using some API that is no longer available (I don't believe there is a way around native private fields).

corford commented 2 years ago

We are apparently using some API that is no longer available

Thinking about it, yep that makes more sense than TS

corford commented 2 years ago

The bit that isn't super clear to me is why it only manifests when wrapped with puppeteer-extra

B4nan commented 2 years ago

https://github.com/berstend/puppeteer-extra/pull/653

B4nan commented 2 years ago

I guess we can close this, right? You are apparently using outdated version of puppeteer-extra, the PR I mentioned is in v2.10 and you are on v2.3.1

corford commented 2 years ago

Sorry @B4nan created the ticket late last night (too sleepy). I'm actually on 2.10.1 of puppeteer-extra-plugin-stealth (and 2.3.1 of puppeteer-extra-plugin-user-preferences)

I've updated the ticket description with the correct version

corford commented 2 years ago

So I have those latest changes to pages._client() yet still have the issue with puppeteer v14.4.40+ and browser-pool

B4nan commented 2 years ago

Can you give us some reproduction? Your snippet is the working one, right? So provide failing one instead, that's more important than showing how to work it around.

corford commented 2 years ago

Ok, here's a minimal reproduction that triggers the same "Cannot read private member from an object whose class did not declare it" error.

Note: all of below is with puppeteer-core: ^14.4.0, puppeteer-extra: ^3.3.0, puppeteer-extra-plugin: ^3.2.0, puppeteer-extra-plugin-stealth: ^2.10.1, puppeteer-extra-plugin-user-preferences: ^2.3.1 and apify: ^2.3.2

Using puppeteer directly with stealth plugin (this works, no error)

import vanillaPuppeteer from 'puppeteer-core';
import { addExtra } from 'puppeteer-extra';
import Stealth from 'puppeteer-extra-plugin-stealth';
import UserAgentOverride from 'puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js'; // eslint-disable-line max-len
import WebGLVendor from 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor/index.js';
import UserPreferences from 'puppeteer-extra-plugin-user-preferences/index.js';

const stealth = Stealth();
stealth.enabledEvasions.delete('user-agent-override');
stealth.enabledEvasions.delete('webgl.vendor');

const prefs = UserPreferences({
  userPrefs: {
    intl: {
      accept_languages: 'en-US',
    },
    webrtc: {
      ip_handling_policy: 'disable_non_proxied_udp',
      multiple_routes_enabled: false,
      nonproxied_udp_enabled: false,
    },
  },
});

const ua = UserAgentOverride({
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
  locale: 'en-US',
  maskLinux: true,
});

const webgl = WebGLVendor({
  vendor: 'Google Inc. (Intel)',
  renderer: 'ANGLE (Intel, Intel(R) UHD Graphics Direct3D11 vs_5_0 ps_5_0, D3D11-27.20.100.8984)',
});

const puppeteer = addExtra(vanillaPuppeteer);
puppeteer.use(stealth);
puppeteer.use(prefs);
puppeteer.use(ua);
puppeteer.use(webgl);

(async () => {
  const browser = await puppeteer.launch({ executablePath: '/usr/bin/google-chrome' });
  const page = await browser.newPage();
  await page.setViewport({ width: 800, height: 600 });

  console.log(`Testing the stealth plugin..`);
  await page.goto('https://bot.sannysoft.com');
  await page.waitForTimeout(5000);
  await page.screenshot({ path: 'stealth.png', fullPage: true });

  console.log(`All done, check the screenshots. ✨`);
  await browser.close();
})();

Using Apify.launchPuppeteer with stealth plugin (this triggers the error when browser closes)

import Apify from 'apify';
import vanillaPuppeteer from 'puppeteer-core';
import { addExtra } from 'puppeteer-extra';
import Stealth from 'puppeteer-extra-plugin-stealth';
import UserAgentOverride from 'puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js'; // eslint-disable-line max-len
import WebGLVendor from 'puppeteer-extra-plugin-stealth/evasions/webgl.vendor/index.js';
import UserPreferences from 'puppeteer-extra-plugin-user-preferences/index.js';

const stealth = Stealth();
stealth.enabledEvasions.delete('user-agent-override');
stealth.enabledEvasions.delete('webgl.vendor');

const prefs = UserPreferences({
  userPrefs: {
    intl: {
      accept_languages: 'en-US',
    },
    webrtc: {
      ip_handling_policy: 'disable_non_proxied_udp',
      multiple_routes_enabled: false,
      nonproxied_udp_enabled: false,
    },
  },
});

const ua = UserAgentOverride({
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36',
  locale: 'en-US',
  maskLinux: true,
});

const webgl = WebGLVendor({
  vendor: 'Google Inc. (Intel)',
  renderer: 'ANGLE (Intel, Intel(R) UHD Graphics Direct3D11 vs_5_0 ps_5_0, D3D11-27.20.100.8984)',
});

const puppeteer = addExtra(vanillaPuppeteer);
puppeteer.use(stealth);
puppeteer.use(prefs);
puppeteer.use(ua);
puppeteer.use(webgl);

(async () => {
  const browser = await Apify.launchPuppeteer({
    useChrome: false,
    stealth: false,
    launchOptions: {
      executablePath: '/usr/bin/google-chrome',
    },
    launcher: puppeteer,
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 800, height: 600 });

  console.log(`Testing the stealth plugin..`);
  await page.goto('https://bot.sannysoft.com');
  await page.waitForTimeout(5000);
  await page.screenshot({ path: 'stealth.png', fullPage: true });

  console.log(`All done, check the screenshots. ✨`);
  await browser.close();
})();

Using Apify.launchPuppeteer without stealth plugin (this works, no errors)

import Apify from 'apify';

(async () => {
  const browser = await Apify.launchPuppeteer({
    useChrome: false,
    stealth: false,
    launchOptions: {
      executablePath: '/usr/bin/google-chrome',
    },
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 800, height: 600 });

  console.log(`Testing without stealth plugin..`);
  await page.goto('https://bot.sannysoft.com');
  await page.waitForTimeout(5000);
  await page.screenshot({ path: 'stealth.png', fullPage: true });

  console.log(`All done, check the screenshots. ✨`);
  await browser.close();
})();
corford commented 2 years ago

Are you able to reproduce based on above or is this issue something local to my setup?

B4nan commented 2 years ago

I am able to see the same (or similar) in our tests even without that puppeteer-extra package - and that is the issue we primarily need to solve as we will be releasing Crawlee (successor to Apify SDK) during next week or two. I hope it will be the same problem as yours, still can't promise we will backport this to v2 (it might end up being a breaking fix).

corford commented 2 years ago

Ok, thanks for the extra context and info. Looking forward to kicking the tyres of Crawlee once it's out (and praying the work to port our internal framework - which is currently built on top of Apify SDK - wont be too enormous!).

B4nan commented 2 years ago

Some observations:

B4nan commented 2 years ago

Right, looks like I found a way to get around it, your repro is passing with the changes from the linked PR. There are probably more things to handle this way but for your repro it was enough to handle close and createIncognitoBrowserContext methods. Will first make the PR pass all the crawlee tests before backporting it here, but I dont see any roadblocks, won't be breaking.

corford commented 2 years ago

Nice one @B4nan 💪