Closed pantajoe closed 3 months ago
Did you end up finding a fix for this?
No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with @google-cloud/functions-framework
and custom Docker images.
On GCP, you can even run headful chromium with puppeteer. So yeah π€·π½ββοΈ
No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with
@google-cloud/functions-framework
and custom Docker images.On GCP, you can even run headful chromium with puppeteer. So yeah π€·π½ββοΈ
Thanks, will check it out :)
Happy to help :) Just for clarification: I don't use this package anymore but install the default puppeteer chromium or I install it via a system package.
No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with
@google-cloud/functions-framework
and custom Docker images.On GCP, you can even run headful chromium with puppeteer. So yeah π€·π½ββοΈ
Having the same problem as you @pantajoe and @ashwwwin. Before I port my project to GCP I'd like to ask.. Is there a reason you opted for GCP? With AWS Lambda, you can also use docker.
No, we shifted to Google Cloud Platforms and ended up using Google Cloud Run with
@google-cloud/functions-framework
and custom Docker images. On GCP, you can even run headful chromium with puppeteer. So yeah π€·π½ββοΈHaving the same problem as you @pantajoe and @ashwwwin. Before I port my project to GCP I'd like to ask.. Is there a reason you opted for GCP? With AWS Lambda, you can also use docker.
Sure, I get wanting to make sure π The reason is plain and simple: We tried using AWS Lambda with customer docker images before porting our serverless functions to GCP. Unfortunately, we found that the same restrictions apply π¬ We used alpine and tried installing chromium both via the default alpine repository as well as with this package and both approaches did not work at all. Chromium installed via alpine did not even start which is why this package here exists. And using this package's custom chromium had exactly this limitation that extensions do not work.
@pantajoe Went down the rabbit hole of setting the sparticuz/chromium and running custom docker images :D
Perhaps interesting: default chromium.args
provide by this package have a flag --disable-extensions
so it is impossible to get it to work out of the box. Unfortunately, even with this flag omitted, extensions don't load. Perhaps someone with more knowledge of chromium internals could get it to work if they knew that extensions are disabled via flag in the first place.
Regarding docker images, I had great success just installing google chrome from the debian repository. Here is image example: https://dev.to/cloudx/how-to-use-puppeteer-inside-a-docker-container-568c. Extensions work as expected.
The problem is, that it does not pass cloudflare bot detection if the script is run inside docker image, but works perfectly fine if run from my machine directly.
@mittster Thanks for the update! Interesting, I certainly encountered the very same article you linked during my work π Did you run Chromium in headless or headful mode? Because for headless mode I found that customer docker images and downloading chromium from a repository work like you described (for me, my extension with puppeteer-stream
didn't work, maybe others do). However, it didn't work with headful Chromium at all for me.
@pantajoe All the tests I've done were in headless mode. Never succeeded running in headful.
I've ported to GCP Functions and bot detection issues are the same. I suppose I shouldn't be surprised, because GCP Functions use docker internally. And thats not because of puppeteer(I don't use puppeteer, but custom implementation to avoid detection).
You said in one of the posts above: On GCP, you can even run headful chromium with puppeteer.
May I ask how did you do it?
@mittster I see, that would've been expected yeah π It's just a matter of setting the correct arguments:
`--window-size=${width},${height}`,
`--ozone-override-screen-size=${width},${height}`,
'--force-color-profile=srgb',
'--disable-gpu',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
`--display=${display}`,
'--disable-dev-shm-usage',
'--disable-background-networking',
'--disable-prompt-on-repost',
'--disable-client-side-phishing-detection',
'--disable-extensions',
'--disable-features=site-per-process',
'--disable-infobars',
'--no-first-run',
'--start-fullscreen',
'--autoplay-policy=no-user-gesture-required',
'--hide-scrollbars',
'--window-position=0,0',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
'--disable-software-rasterizer',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
// only if you have and want audio
'--audio-output-channels=2',
`--alsa-output-device=${audioSink.name}`
@pantajoe your solution is much simpler. I installed virtual display server xvfb to get it to work in headful. Still a bot though.
I haven't kept up with this whole conversation, but puppeteer's docs state you can do this
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
});
I haven't kept up with this whole conversation, but puppeteer's docs state you can do this
const browser = await puppeteer.launch({ ignoreDefaultArgs: ['--disable-extensions'], });
Yes, that's true, I even did ignoreDefaultArgs: true
(see original issue description) to ignore all default args, and set every flag in headless mode myself, but that didn't work either π
Since that's no longer relevant for me, feel free to close the issue for now.
I have the same issue right now. Starting Chromium with an extension works locally, but it doesn't work on Lambda. I tried every possible combination of args and investigated the logs, but it just doesn't work. I don't really know if it is an Chromium or Puppeteer issue, but currently leaning towards to think that the Chromium Linux build is not working properly.
I assume this issue will pop up even more over time since it's possible to run extension in the new headless mode of Chromium.
Environment
chromium
Version: 117.0.0puppeteer-core
Version:nodejs18.x
Expected Behavior
Hello there, thanks for maintaining this library! I don't exactly know if it's a bug report, a question, or a feature request, but here's what I have a problem with: I want to start chromium with
--headless=new
and start it with an extension that uses thetabCapture
API: puppeteer-stream. This library adds the extension with the launch args:--load-extension
--disable-extensions-except
--allowlisted-extension-id
When I test it locally with the exact same launch options and the same chrome version, it works without a problem. On AWS Lambda, this does not work sadly.
Current Behavior
The browser starts successfully and does not produce any warning or error logs (with
dumpio
and a--log-level
arg) and I can interact with it as usual. But as soon as thepuppeteer-stream
library executesbrowser.waitForTarget
to wait for the background service worker of the chrome extenion, it fails with theTimeoutError
.I added event listeners to log created and destroyed targets, and the extension does not pop up. Maybe the custom compiled chromium version does not support extensions?
Steps to Reproduce
-->