Sparticuz / chromium

Chromium (x86-64) for Serverless Platforms
MIT License
851 stars 57 forks source link

[BUG] Does not work with chrome extensions #164

Closed pantajoe closed 3 months ago

pantajoe commented 9 months ago

Environment

Expected Behavior

Hello there, thanks for maintaining this library! I don't exactly know if it's a bug report, a question, or a feature request, but here's what I have a problem with: I want to start chromium with --headless=new and start it with an extension that uses the tabCapture API: puppeteer-stream. This library adds the extension with the launch args:

When I test it locally with the exact same launch options and the same chrome version, it works without a problem. On AWS Lambda, this does not work sadly.

Current Behavior

The browser starts successfully and does not produce any warning or error logs (with dumpio and a --log-level arg) and I can interact with it as usual. But as soon as the puppeteer-stream library executes browser.waitForTarget to wait for the background service worker of the chrome extenion, it fails with the TimeoutError.

I added event listeners to log created and destroyed targets, and the extension does not pop up. Maybe the custom compiled chromium version does not support extensions?

Steps to Reproduce

const { getStream, launch } = require('puppeteer-stream');
const puppeteer = require('puppeteer-core');
const chromium = require('@sparticuz/chromium');
const fs = require('node:fs');

chromium.setHeadlessMode = 'new'
chromium.setGraphisMode = true

/**
 * Args that are forbidden for chromium for video and audio capturing to work properly.
 * Some of these args are set by us and some are set by "puppeteer-stream".
 */
const FORBIDDEN_PUPPETEER_ARGS = [
  '--disable-extensions',
  '--mute-audio',
  '--window-size',
  '--disable-component-extensions-with-background-pages',
  '--disable-default-apps',
  '--headless',
]

exports.handler = async (event, context, callback) => {
  let result = null;
  let browser = null;

  try {
    browser = await launch(
      {
        launch: async (opts: PuppeteerLaunchOptions) => {
          const options: PuppeteerLaunchOptions = { ...opts, headless: 'new' }

          const extensionPath = options.args!.find((arg) => arg.startsWith('--load-extension='))!.split('=')[1]
          console.log('debug', 'Loading extension', { extensionPath, contents: fs.readdirSync(extensionPath) })

          console.log('debug', 'Launching browser', options)
          return puppeteer.launch(options)
        },
      },
      {
        executablePath: await chromium.executablePath(),
        args: [
          chromium.args.filter((arg) => FORBIDDEN_PUPPETEER_ARGS.every((forbidden) => !arg.startsWith(forbidden))),
          `--window-size=1920,1080`,
          '--start-fullscreen',
          `--ozone-override-screen-size=1920,1080`,
          '--headless=new',
        ],
        ignoreDefaultArgs: true,
        waitForInitialPage: false,
        defaultViewport: null,
        ignoreHTTPErrors: true,
      },
    )

    let page = await browser.newPage();

    await page.goto(event.url || 'https://example.com');

    const stream = await getStream(page, { audio: true, video: true });
    const output = '/tmp/testfile.webm';
    const file = fs.createWriteStream(output);
    stream.pipe(file);
    await new Promise((resolve) => {
      setTimeout(async () => {
        await stream.destroy()
        file.close()
        console.log('debug', 'Destroyed stream')
        resolve(true)
      }, 1000 * 10); // 10 seconds
    });

    const fileSize = fs.statSync(output).size
    return callback(null, JSON.stringify({ fileSize }))
  } catch (error) {
    return callback(error);
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }

  return callback(null, result);
};

-->

ashwwwin commented 7 months ago

Did you end up finding a fix for this?

pantajoe commented 7 months ago

No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with @google-cloud/functions-framework and custom Docker images.

On GCP, you can even run headful chromium with puppeteer. So yeah πŸ€·πŸ½β€β™‚οΈ

ashwwwin commented 7 months ago

No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with @google-cloud/functions-framework and custom Docker images.

On GCP, you can even run headful chromium with puppeteer. So yeah πŸ€·πŸ½β€β™‚οΈ

Thanks, will check it out :)

pantajoe commented 7 months ago

Happy to help :) Just for clarification: I don't use this package anymore but install the default puppeteer chromium or I install it via a system package.

mittster commented 4 months ago

No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with @google-cloud/functions-framework and custom Docker images.

On GCP, you can even run headful chromium with puppeteer. So yeah πŸ€·πŸ½β€β™‚οΈ

Having the same problem as you @pantajoe and @ashwwwin. Before I port my project to GCP I'd like to ask.. Is there a reason you opted for GCP? With AWS Lambda, you can also use docker.

pantajoe commented 4 months ago

No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with @google-cloud/functions-framework and custom Docker images. On GCP, you can even run headful chromium with puppeteer. So yeah πŸ€·πŸ½β€β™‚οΈ

Having the same problem as you @pantajoe and @ashwwwin. Before I port my project to GCP I'd like to ask.. Is there a reason you opted for GCP? With AWS Lambda, you can also use docker.

Sure, I get wanting to make sure πŸ˜„ The reason is plain and simple: We tried using AWS Lambda with customer docker images before porting our serverless functions to GCP. Unfortunately, we found that the same restrictions apply 😬 We used alpine and tried installing chromium both via the default alpine repository as well as with this package and both approaches did not work at all. Chromium installed via alpine did not even start which is why this package here exists. And using this package's custom chromium had exactly this limitation that extensions do not work.

mittster commented 4 months ago

@pantajoe Went down the rabbit hole of setting the sparticuz/chromium and running custom docker images :D

Perhaps interesting: default chromium.args provide by this package have a flag --disable-extensions so it is impossible to get it to work out of the box. Unfortunately, even with this flag omitted, extensions don't load. Perhaps someone with more knowledge of chromium internals could get it to work if they knew that extensions are disabled via flag in the first place.

Regarding docker images, I had great success just installing google chrome from the debian repository. Here is image example: https://dev.to/cloudx/how-to-use-puppeteer-inside-a-docker-container-568c. Extensions work as expected.

The problem is, that it does not pass cloudflare bot detection if the script is run inside docker image, but works perfectly fine if run from my machine directly.

pantajoe commented 4 months ago

@mittster Thanks for the update! Interesting, I certainly encountered the very same article you linked during my work πŸ˜„ Did you run Chromium in headless or headful mode? Because for headless mode I found that customer docker images and downloading chromium from a repository work like you described (for me, my extension with puppeteer-stream didn't work, maybe others do). However, it didn't work with headful Chromium at all for me.

mittster commented 4 months ago

@pantajoe All the tests I've done were in headless mode. Never succeeded running in headful.

I've ported to GCP Functions and bot detection issues are the same. I suppose I shouldn't be surprised, because GCP Functions use docker internally. And thats not because of puppeteer(I don't use puppeteer, but custom implementation to avoid detection).

You said in one of the posts above: On GCP, you can even run headful chromium with puppeteer. May I ask how did you do it?

pantajoe commented 4 months ago

@mittster I see, that would've been expected yeah πŸ˜„ It's just a matter of setting the correct arguments:

`--window-size=${width},${height}`,
`--ozone-override-screen-size=${width},${height}`,
'--force-color-profile=srgb',
'--disable-gpu',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
`--display=${display}`,
'--disable-dev-shm-usage',
'--disable-background-networking',
'--disable-prompt-on-repost',
'--disable-client-side-phishing-detection',
'--disable-extensions',
'--disable-features=site-per-process',
'--disable-infobars',
'--no-first-run',
'--start-fullscreen',
'--autoplay-policy=no-user-gesture-required',
'--hide-scrollbars',
'--window-position=0,0',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--disable-software-rasterizer',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
// only if you have and want audio
'--audio-output-channels=2',
`--alsa-output-device=${audioSink.name}`
mittster commented 4 months ago

@pantajoe your solution is much simpler. I installed virtual display server xvfb to get it to work in headful. Still a bot though.

Sparticuz commented 3 months ago

I haven't kept up with this whole conversation, but puppeteer's docs state you can do this

const browser = await puppeteer.launch({
  ignoreDefaultArgs: ['--disable-extensions'],
});
pantajoe commented 3 months ago

I haven't kept up with this whole conversation, but puppeteer's docs state you can do this

const browser = await puppeteer.launch({
  ignoreDefaultArgs: ['--disable-extensions'],
});

Yes, that's true, I even did ignoreDefaultArgs: true (see original issue description) to ignore all default args, and set every flag in headless mode myself, but that didn't work either πŸ˜…

Since that's no longer relevant for me, feel free to close the issue for now.

zirkelc commented 2 months ago

I have the same issue right now. Starting Chromium with an extension works locally, but it doesn't work on Lambda. I tried every possible combination of args and investigated the logs, but it just doesn't work. I don't really know if it is an Chromium or Puppeteer issue, but currently leaning towards to think that the Chromium Linux build is not working properly.

I assume this issue will pop up even more over time since it's possible to run extension in the new headless mode of Chromium.