Open Mihailoff opened 3 years ago
After playing with this for few days, it looks like networkidle0
won't resolve because of simultaneous requests triggered by multiple lambdas.
@alixaxel I'm asking the same question. Why shouldn't we reuse the puppeteer.launch?
AWS lambda is containerized and Node.js is single-thread. I didn't see any reason and drawbacks to reuse puppeteer.launch between each call.
I tried to run Mihailoff's code snippet with Playwright (after adapting it for its API, ie. playwright.chromium.launchServer
), but I was not able to reuse the browser process.
Playwright can launch the Chromium background process (server), then connect to it via WS endpoint just fine, and you can do basically anything as long as this is the first lambda invocation.
During second invocation, I can still connect to the WS endpoint, create a new browser context, but when I attempt to create a new page, I get Browser has been closed
error.
During third invocation, it seems the server is no longer running because I am getting error connect ECONNREFUSED 127.0.0.1:36131. Most likely ws endpoint is incorrect
. My best guess at what's happening is that during second invocation, the server process crashes, so during third invocation it no longer exists (so connection is not successful).
I also attempted to not create a server at all, instead simply cache the Browser
instance created via playwright.chromium.launch
. This however produced the same experience as above:
Browser has been closed
error when I attempt to create a page (after I successfully create a context)Target page, context or browser has been closed
error when I attempt to create a context (before attempting to create a page)I found a workaround to get the caching of Browser
instances work across invocations though. The first time you create a browser, also create a new empty page, and just leave it there. Then go on and do your things in a new context. On second invocation, again create a new context and go on with your stuff. The initial empty page is just sitting there the whole time, somehow keeping the Browser
instance from either crashing or closing itself.
I do not know how brittle this workaround is or how long its gonna work. But it seems to be the only way to get chrome-aws-lambda
work with Playwright while reusing browsers. Creating a new browser every time is very expensive and severely impacts responsiveness. Once the workaround stops working, the only way to keep your performance is using playwright-built Docker images and adding lambda runtime on top of it, which sadly means dropping chrome-aws-lambda
out of your project.
@paya-cz thanks for your comment, the code below seems to work
const chromium = require('chrome-aws-lambda')
let browserWSEndpoint
module.exports.handler = async function (event) {
let browser, page
try {
if (browserWSEndpoint) {
browser = await chromium.puppeteer.connect({ browserWSEndpoint })
}
if (!browser || !browser.isConnected()) {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
})
// Keep blank page open
browser.newPage()
browserWSEndpoint = browser.wsEndpoint()
}
page = browser.newPage()
// ...
} finally {
if (page) page.close()
if (browser) browser.disconnect()
}
}
@Mihailoff curious to know if this reusability does actually improve the performance? Since we're still opening a new page before every new execution
So far I haven't seen any major performance improvements (connect vs launch). I think the actual page load is far more expensive than the new tab or browser launch. I'll continue to experiment.
FYI your code is not showing a tangible improvement because you're disconnecting from the browser at the end of each lambda run (see browser.disconnect()
in the handler) hence you're spinning up a new browser for each run.
If you remove that line you should see a dramatic performance improvement (roughly 3x).
Based on the @paya-cz comment, leaving one blank page will keep the browser instance alive. I can confirm that the subsequent run connects to the same instance. It takes about 100-500ms
to connect compared to 2-3s
for the first launch.
Perhaps caching a connected state will improve it even further. @squallstar do you have any metrics to share?
@Mihailoff by not closing the connection with the browser I'm getting roughly 4x-5x performance increase.
I have done around a thousand tests on sample webpages and the average end-to-end (round trip) decreased from 2s to just 500ms.
Hi, solution above works for me, but for some small period of time, after which I suppose aws force deletes browser instance. Or is it something new in aws lambda behavior?
Did anyone try to cache browser process and simply connect/disconnect with every lambda run? So far seem to work well.