danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!
https://danfickle.github.io/pdf-templates/index.html
Other
1.91k stars 356 forks source link

Large Html conversion consume much memory #962

Open Hikariqz opened 8 months ago

Hikariqz commented 8 months ago

Hi guys,

I'm having an issue where converting a large HTML document to PDF is consuming a significant amount of memory. The HTML represents around 1200 pages, but the file size is only around 3MB.

When generating the PDF, I noticed my Java VM memory usage increasing up to around 1GB. I used a profiler to investigate further and found that a lot of byte arrays were being created during the PDF generation process. Most of the memory usage appeared to be occurring in calls to com.openhtmltopdf.css.newmatch.Condition#matches which involves java.lang.StringBuilder.toString or java.lang.StringBuilder.

image

I'm hoping someone may be able to provide some help or insights on how I could optimize this process to use less memory. Converting such a large number of HTML pages to PDF seems to be straining the memory usage and I want to find a more efficient way to handle it.

Really appreciate it!

siegelzc commented 8 months ago

I'm going to take a look at this. I'm also working with some pretty large HTML files, so this affects me too. In the meantime, I'm going to tag you on a duplicate issue at a forked repository where we're going to be doing new development. (see #921)

See: https://github.com/openhtmltopdf/openhtmltopdf/issues/1#issue-2096902877

Hikariqz commented 8 months ago

Thank you @siegelzc