alvarcarto / url-to-pdf-api

Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
MIT License
7.01k stars 774 forks source link

Chrome tab pooling #3

Closed kimmobrunfeldt closed 5 years ago

kimmobrunfeldt commented 6 years ago

Each requests starts a new instance with Puppeteer. We should use a pool of e.g. 4 tabs to make rendering as a service more reliable.

See: https://github.com/GoogleChrome/puppeteer/issues/518

SpaceK33z commented 6 years ago

You might be interested in this pooling solution I made in a fork of your project. It uses the generic-pool package and has already been tested with thousands of PDF's (we are using it quite heavily). It doesn't use tab pooling since it was very unstable when I tried it, but this browser pooling solution already offers much better performance.

Kikobeats commented 6 years ago

Thanks for your comment @SpaceK33z.

Based on that, I implemented pool support at browserless

One thing, are you defined 4 and 8 as size of your pool for something?

In addition, I added a limitation of 50 uses per browser: https://github.com/Kikobeats/browserless/pull/15/files#diff-6499deccd988c3ae5b5ba9a58e719d5aR8

SpaceK33z commented 6 years ago

Cool!

One thing, are you defined 4 and 8 as size of your pool for something?

Those max number is more or less based on the hosting provider they run in, to not use too much memory / cpu. The min number means that at minimum 4 browsers always are running, which can improve the performance compared to e.g. 3 if you get 4 requests at the same time (that would mean one browser would still have to start).

Kikobeats commented 6 years ago

@SpaceK33z in my case I'm deploying to AWS Lambda.

Are you using any script for benchmarking what values work better?

SpaceK33z commented 6 years ago

I made this very basic script back when I was trying to use tab pooling instead of browser pooling (which caused a weird memory leak): https://github.com/Volst/url-to-pdf-api/blob/master/loadtest.js

It tries to load a test invoice. You can use the script like this: node loadtest.js 10 (10 being the amount of requests). All the requests will happen in parallel. It would be nice to improve the script to set the concurrency you want to use (e.g. 100 requests, max. 5 at the time).

kimmobrunfeldt commented 5 years ago

I'm closing this for now because the solution without pooling seems to work quite well for many use cases. My hypothesis is that it wouldn't make a huge difference, but that can be only verified by testing.

Workaround for this is to simply scale new servers (or Heroku dynos) horizontally. RAM usage will probably be the bottleneck of scaling.