alixaxel / chrome-aws-lambda

Chromium Binary for AWS Lambda and Google Cloud Functions
MIT License
3.22k stars 293 forks source link

Reuse browser process to improve lambda performance? #217

Open Mihailoff opened 3 years ago

Mihailoff commented 3 years ago

Did anyone try to cache browser process and simply connect/disconnect with every lambda run? So far seem to work well.

const chromium = require('chrome-aws-lambda')

let browserWSEndpoint

module.exports.handler = async function (event) {
  let browser = null

  try {
    if (browserWSEndpoint) {
      browser = await chromium.puppeteer.connect({ browserWSEndpoint })
    }
    else if (!browser || !browser.isConnected()) {
      browser = await chromium.puppeteer.launch({
        args: chromium.args,
        defaultViewport: chromium.defaultViewport,
        executablePath: await chromium.executablePath,
        headless: chromium.headless,
      })

      browserWSEndpoint = browser.wsEndpoint()
    }

    // ...

  } finally {
      if (browser) browser.disconnect()
  }
}
Mihailoff commented 3 years ago

After playing with this for few days, it looks like networkidle0 won't resolve because of simultaneous requests triggered by multiple lambdas.

ixartz commented 3 years ago

@alixaxel I'm asking the same question. Why shouldn't we reuse the puppeteer.launch?

AWS lambda is containerized and Node.js is single-thread. I didn't see any reason and drawbacks to reuse puppeteer.launch between each call.

paya-cz commented 3 years ago

I tried to run Mihailoff's code snippet with Playwright (after adapting it for its API, ie. playwright.chromium.launchServer), but I was not able to reuse the browser process.

Playwright can launch the Chromium background process (server), then connect to it via WS endpoint just fine, and you can do basically anything as long as this is the first lambda invocation.

During second invocation, I can still connect to the WS endpoint, create a new browser context, but when I attempt to create a new page, I get Browser has been closed error.

During third invocation, it seems the server is no longer running because I am getting error connect ECONNREFUSED 127.0.0.1:36131. Most likely ws endpoint is incorrect. My best guess at what's happening is that during second invocation, the server process crashes, so during third invocation it no longer exists (so connection is not successful).


I also attempted to not create a server at all, instead simply cache the Browser instance created via playwright.chromium.launch. This however produced the same experience as above:


I found a workaround to get the caching of Browser instances work across invocations though. The first time you create a browser, also create a new empty page, and just leave it there. Then go on and do your things in a new context. On second invocation, again create a new context and go on with your stuff. The initial empty page is just sitting there the whole time, somehow keeping the Browser instance from either crashing or closing itself.

I do not know how brittle this workaround is or how long its gonna work. But it seems to be the only way to get chrome-aws-lambda work with Playwright while reusing browsers. Creating a new browser every time is very expensive and severely impacts responsiveness. Once the workaround stops working, the only way to keep your performance is using playwright-built Docker images and adding lambda runtime on top of it, which sadly means dropping chrome-aws-lambda out of your project.

Mihailoff commented 2 years ago

@paya-cz thanks for your comment, the code below seems to work

const chromium = require('chrome-aws-lambda')

let browserWSEndpoint

module.exports.handler = async function (event) {
  let browser, page

  try {
    if (browserWSEndpoint) {
      browser = await chromium.puppeteer.connect({ browserWSEndpoint })
    }

    if (!browser || !browser.isConnected()) {
      browser = await chromium.puppeteer.launch({
        args: chromium.args,
        defaultViewport: chromium.defaultViewport,
        executablePath: await chromium.executablePath,
        headless: chromium.headless,
      })

      // Keep blank page open
      browser.newPage()

      browserWSEndpoint = browser.wsEndpoint()
    }

    page = browser.newPage()
    // ...

  } finally {
      if (page) page.close()
      if (browser) browser.disconnect()
  }
}
captainjackrana commented 2 years ago

@Mihailoff curious to know if this reusability does actually improve the performance? Since we're still opening a new page before every new execution

Mihailoff commented 2 years ago

So far I haven't seen any major performance improvements (connect vs launch). I think the actual page load is far more expensive than the new tab or browser launch. I'll continue to experiment.

squallstar commented 2 years ago

FYI your code is not showing a tangible improvement because you're disconnecting from the browser at the end of each lambda run (see browser.disconnect() in the handler) hence you're spinning up a new browser for each run.

If you remove that line you should see a dramatic performance improvement (roughly 3x).

Mihailoff commented 2 years ago

Based on the @paya-cz comment, leaving one blank page will keep the browser instance alive. I can confirm that the subsequent run connects to the same instance. It takes about 100-500ms to connect compared to 2-3s for the first launch.

Perhaps caching a connected state will improve it even further. @squallstar do you have any metrics to share?

squallstar commented 2 years ago

@Mihailoff by not closing the connection with the browser I'm getting roughly 4x-5x performance increase.

I have done around a thousand tests on sample webpages and the average end-to-end (round trip) decreased from 2s to just 500ms.

appindus-apipchenko commented 1 month ago

Hi, solution above works for me, but for some small period of time, after which I suppose aws force deletes browser instance. Or is it something new in aws lambda behavior?