frinyvonnick / node-html-to-image

A Node.js module that generates images from HTML
https://www.npmjs.com/package/node-html-to-image
Apache License 2.0
824 stars 126 forks source link

Is it possible to reuse the puppeteer instance for better performance? #80

Open tomerb15 opened 3 years ago

tomerb15 commented 3 years ago

What is your recommended settings to maximize performance in the following case:

  1. Small HTMLs usually resulting jpeg images of 10K-50K in size
  2. HTML content is 100% trusted
  3. HTML is send via data uri
  4. The image is returned in memory as a buffer and not saved to disk.

Thanks!

frinyvonnick commented 3 years ago

Hi @tomerb15 👋

I don't understand what is the issue you encountered. Is it too slow? If it is the case can you share your code, please?

tomerb15 commented 3 years ago

Hey, we use it to generate many small screenshots. Basically small HTMLs to JPEGs. The problem is that we have high volumes.

We found puppeteer-cluster and with the right configuration, we manage to crack it down and go from ~900ms per image (Which grow up to 60 seconds at peak) to a steady ~350ms per image which solved our problem.

Hope this will help anyone in the future.

frinyvonnick commented 3 years ago

@tomerb15 as you can see here node-html-to-image uses puppeteer-cluster under the hood. Maybe you could help optimize it with your configuration?

tomerb15 commented 3 years ago

Here is our configuration for puppeteer-cluster which works very well:

      {
            concurrency: Cluster.CONCURRENCY_CONTEXT,
            maxConcurrency: 10,
            puppeteerOptions: {args: ['--no-sandbox', '--headless', '--disable-gpu']}
        }

We don't see any memory leak from the browsers. We also taking screenshot from a specific element in the dom rather then the entire body.

erohana commented 3 years ago

@tomerb15 how did you configure it ? can you please provide an example ?

KL13NT commented 3 years ago

Perhaps the maxConcurrency option can be specified as part of the options? I have a bot that produces images and spawning an instance every time to render images is a huge performance hit for me (about 500ms on average + network overhead). Different systems have different capabilities and the maxConcurrency option should be dependent on users, perhaps with the default as 2. Willing to PR this.

KL13NT commented 3 years ago

Just discovered another performance-related issue. You create a cluster for a batch of jobs, but the maxConcurrency option fails to use the full advantage of puppeteer-cluser AND the fact that you terminated the whole cluster after the job is done is also a performance hit.

So issues currently present are:

Solutions:

I'm ready to work on implementing these solutions.

KL13NT commented 3 years ago

Spawning a cluster on a clean start of the bot took 3,000ms. image

Subsequent spawns took an average of over 300ms. image

joshk0 commented 3 years ago

@frinyvonnick It would be nice to create some kind of instance object, this way puppeteer can be launched in the background as soon as the module is required or the object is instantiated. Because the instantiation of puppeteer occurs within the nodeHtmlToImage() call, we can't safely start chrome until that very moment.

In other words, my Dream Code:

// imageMaker contains the puppeteer instance
const imageMaker = require('node-html-to-image');

imageMaker.createImage({html: html}) // as many times as you want... reuses the same browser
KL13NT commented 3 years ago

@frinyvonnick The changes would be significant in terms of performance and not go beyond a few lines of code.

frinyvonnick commented 3 years ago

@KL13NT I look at all issues and new messages. Answering takes some time. Avoid pinging me without new information. I'll answer as soon as I can. I maintain this package in my free time. Please, keep that in mind 😄

BrnPer commented 3 years ago

I'm facing this issue... it takes about 2 seconds to render per image. I've already tried to change puppeter settings, but the time didn't change... can you please help me? I generate a buffer with the image in jpeg. Thanks in advance.

joshk0 commented 3 years ago

@BrnPer @KL13NT I'd love a tester for my change #95 . If you're willing to tweak the way you call the library, you can test it easily.

frinyvonnick commented 3 years ago

@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.

Here are the existing use cases:

Single image

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello world!</body></html>',
})

Batch image creation

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

We might expose a name export that let you reuse the instance like it:

const { launch } = require('node-html-to-image');

const instance = launch({ puppeteerArgs })

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: { name: 'Michel' }
})
  .then(() => console.log('The image was created successfully!'))
BrnPer commented 3 years ago

@frinyvonnick I think it would be a good idea to expose the instance. In my testing I shave some milliseconds off by using the following puppeterArgs:

['--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-accelerated-2d-canvas', '--no-first-run', '--no-zygote'] 

And changing the waitUntil to 'load'. It now takes about 1 second to generate the image. But if was possible to generate even faster, would be awesome 😎 And btw @frinyvonnick awesome job with this library, well documented and very useful 🎉

zeel01 commented 3 years ago

This would be of great use to me as well, I'm attempting to use this to create information cards to display in Discord embeds. Unfortunately it takes a rather long time for the images to be generated (around 4-5 seconds).

bekworks commented 3 years ago

@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.

Here are the existing use cases:

Single image

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello world!</body></html>',
})

Batch image creation

const nodeHtmlToImage = require('node-html-to-image');

nodeHtmlToImage({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

We might expose a name export that let you reuse the instance like it:

const { launch } = require('node-html-to-image');

const instance = launch({ puppeteerArgs })

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
  .then(() => console.log('The images were created successfully!'))

instance.screenshot({
  html: '<html><body>Hello {{name}}!</body></html>',
  content: { name: 'Michel' }
})
  .then(() => console.log('The image was created successfully!'))

Could we get an update on the whole idea about reusing the instance?

The usecase i'm facing is the following:

12 different HTML templates 10 different sets of content

Right now, we have to go through them, one by one, because we have multiple different templates, which in turn means that we have to create and load a new puppeteer instance for every single image.

I could live with us being able to make html and content pairs

frinyvonnick commented 2 years ago

I discover that puppeteer-cluster can take a puppeteer object as parameter of the launch method (see https://github.com/thomasdondorf/puppeteer-cluster#clusterlaunchoptions). This could be an option to pass your own instance of puppeteer. Is someone interested in opening a pull request to add this option?

Another performance improvement could be to use an instance of puppeteer-core with headless shell (see https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteer-vs-puppeteer-core). Could be useful in some contextes where disk space matters. Anyone interested in by this too?

luanraithz commented 2 years ago

I had performance issues for a scenario where a user request would start some good (~300) amount of screenshots, so if more than a small amount of users decided to do the same thing, the machine would run out of memory. I decided to create a singleton puppeteer cluster and enqueue manually the screenshots, another improvement was to render all the html in the same page and run query selectors (all the divs were small, height around 350px). I went from ~58 seconds to take the screenshots to ~15.

AramZS commented 1 year ago

I'm very interested in this feature!

BlazingTide commented 1 year ago

Any update on this?

Tiffceet commented 1 year ago

What is the current workaround for performance issue