asciidoctor / asciidoctorj-pdf

AsciidoctorJ PDF bundles the Asciidoctor PDF RubyGem (asciidoctor-pdf) so it can be loaded into the JVM using JRuby.
Apache License 2.0
36 stars 17 forks source link

OutOfMemoryError: Java heap space when generating PDF #45

Open wimdeblauwe opened 3 years ago

wimdeblauwe commented 3 years ago

I am using Asciidoc to write a book and have not had any issues so far to generate a PDF using Maven with the following versions:

        <asciidoctor-maven-plugin.version>2.0.0</asciidoctor-maven-plugin.version>
        <asciidoctorj.version>2.4.0</asciidoctorj.version>
        <asciidoctorj-pdf.version>1.5.3</asciidoctorj-pdf.version>

However, I added an extra chapter and now the build fails with:

[ERROR] Java heap space -> [Help 1]
java.lang.OutOfMemoryError: Java heap space
    at org.jruby.RubyString.newString (RubyString.java:498)
    at org.jruby.RubyString.newString (RubyString.java:493)
    at org.jruby.ext.zlib.JZlibInflate.flushOutput (JZlibInflate.java:109)
    at org.jruby.ext.zlib.JZlibInflate.internalFinish (JZlibInflate.java:329)
    at org.jruby.ext.zlib.ZStream.finish (ZStream.java:136)
    at org.jruby.ext.zlib.JZlibInflate.s_inflate (JZlibInflate.java:74)
    at org.jruby.ext.zlib.JZlibInflate$INVOKER$s$1$0$s_inflate.call (JZlibInflate$INVOKER$s$1$0$s_inflate.gen)
    at org.jruby.runtime.callsite.CachingCallSite.call (CachingCallSite.java:172)
    at uri_3a_classloader_3a_.gems.prawn_minus_2_dot_2_dot_2.lib.prawn.images.png.invokeOther61:inflate (uri:classloader:/gems/prawn-2.2.2/lib/prawn/images/png.rb:92)
    at uri_3a_classloader_3a_.gems.prawn_minus_2_dot_2_dot_2.lib.prawn.images.png.RUBY$method$initialize$0 (uri:classloader:/gems/prawn-2.2.2/lib/prawn/images/png.rb:92)
    at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic (DirectMethodHandle$Holder)
    at java.lang.invoke.LambdaForm$MH/0x000000080076d440.invokeExact_MT (LambdaForm$MH)
    at org.jruby.internal.runtime.methods.CompiledIRMethod.call (CompiledIRMethod.java:108)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call (MixedModeIRMethod.java:140)
    at org.jruby.runtime.callsite.CachingCallSite.call (CachingCallSite.java:182)
    at org.jruby.RubyClass.newInstance (RubyClass.java:918)
    at org.jruby.RubyClass$INVOKER$i$newInstance.call (RubyClass$INVOKER$i$newInstance.gen)
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrOneOrNBlock.call (JavaMethod.java:349)
    at org.jruby.runtime.callsite.CachingCallSite.call (CachingCallSite.java:172)
    at uri_3a_classloader_3a_.gems.prawn_minus_2_dot_2_dot_2.lib.prawn.images.invokeOther13:new (uri:classloader:/gems/prawn-2.2.2/lib/prawn/images.rb:92)
    at uri_3a_classloader_3a_.gems.prawn_minus_2_dot_2_dot_2.lib.prawn.images.RUBY$method$build_image_object$0 (uri:classloader:/gems/prawn-2.2.2/lib/prawn/images.rb:92)
    at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic (DirectMethodHandle$Holder)
    at java.lang.invoke.LambdaForm$MH/0x000000080076d440.invokeExact_MT (LambdaForm$MH)
    at org.jruby.internal.runtime.methods.CompiledIRMethod.call (CompiledIRMethod.java:108)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call (MixedModeIRMethod.java:140)
    at org.jruby.internal.runtime.methods.DynamicMethod.call (DynamicMethod.java:200)
    at org.jruby.runtime.callsite.CachingCallSite.call (CachingCallSite.java:172)
    at uri_3a_classloader_3a_.gems.asciidoctor_minus_pdf_minus_1_dot_5_dot_3.lib.asciidoctor.pdf.converter.invokeOther11:build_image_object (uri:classloader:/gems/asciidoctor-pdf-1.5.3/lib/asciidoctor/pdf/converter.rb:1499)
    at uri_3a_classloader_3a_.gems.asciidoctor_minus_pdf_minus_1_dot_5_dot_3.lib.asciidoctor.pdf.converter.RUBY$block$convert_image$6 (uri:classloader:/gems/asciidoctor-pdf-1.5.3/lib/asciidoctor/pdf/converter.rb:1499)
    at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic (DirectMethodHandle$Holder)
    at java.lang.invoke.LambdaForm$MH/0x0000000800769840.invoke (LambdaForm$MH)
    at java.lang.invoke.Invokers$Holder.invokeExact_MT (Invokers$Holder)

I increased the available memory for Maven by creating a .mvn/jvm.config file at the project root to increase the Java heap:

-Xmx2048m

I also tried with -Xmx3048m, but none of that helped. Removal of the last chapter makes it work again. Removing the last chapter and including the 2nd to last chapter 2 times also makes it fail. Using only the last chapter and removing all other chapters also works.

Is there anything else I can try to make it work again?

robertpanzer commented 3 years ago

Can you check if your last chapter has some image (png) that makes the PDF converter consume so much memory?

wimdeblauwe commented 3 years ago

The last chapter indeed has a few PNG images of 1.3 MB each. Not excessive I think?

In the mean time, I tried to run asciidoctor-pdf without Maven and that works without problems.

adrian-tarau commented 3 years ago

Same here ... I have 320 images in the document ... I need to give above 2G of memory to be able to get the PDF. The size of those images is 12Mb..even if those will be kept in memory, encoded, enhanced (whatever it needs to be done to write them in PDF), it does not explain why it needs so much memory.

Same document, no images (so it complains about missing images) generates the document faster and with less than 300MB of used memory.

adrian-tarau commented 3 years ago

Memory is full of ByteList, used by RubyString and the content of the byte[], at least for the big ones it does not look like it is storing string (see the bottom section from profiler).

image

adrian-tarau commented 3 years ago

I was able to generate the help file with -Xmx1200M -XX:NewRatio=10 (changed the ratio to have 1G old area) ... but you can see that old and eden was full for more than 20s, GC going crazy ... but at least it did not fail.

image

adrian-tarau commented 3 years ago

Mystery solved ... switch to JPEG format ... unfortunately. I was able to generate the PDF with 330MB old generation vs 1.1G (and it was never maxed out). The "issue" seems to be in the Prawn PNG class, which does a lot of StringIO creation to transform the PNG because PDF does not support much of the PNG??

Kind of disappointing that the great PDF format needs its images pre-processed. But it is what it is :) Anyway, I do not understand why Prawn (and it seems to be a common pattern across Ruby projects) to use StringIO to process a stream of bytes (images in this case). In Java, bytes are processed using a stream, not a string that can let you iterate/read bytes.

To me, it looks like an inefficient way of adapting a PNG image in Prawn. The JPG implementation does almost noting, just writes the bytes into the PDF and that's the reason why it is faster and consumes less than 1/3 of memory.

image

Also, the generated PDF is almost twice as big as JPG images vs PNG images (the JPG image is at least 2x compared with PNG) at the default compression ratio. Going lower, than 0.75, reduces the quality of the image in a significant way since these are application screenshots (text). Even at 0.75, you can see a difference between PNG and JPG.

But until Prawn is changed to stop using strings, we need to use JPG. Should I open an issue with Prawn or AsciiDoctorJ members could have a better argument to ask for a performance enhancement/optimization?

adrian-tarau commented 3 years ago

Most of the changes in Prawn PNG class are older than 10 years ... maybe PDF supports PNG better these days?

robertpanzer commented 3 years ago

asciidoctorj-pdf is just a repackaging of https://github.com/asciidoctor/asciidoctor-pdf so that it can be consumed with asciidoctorj. The project contains nothing else but that repackaging. If there are better ways now to handle PNGs in Ruby and Prawn it might make more sense to open a ticket for asciidoctor-pdf.

elektro-wolle commented 3 years ago

Found the root-cause in the corresponding prawnpdf issues: https://github.com/prawnpdf/prawn/issues/1153

rdmueller commented 2 years ago

today I had the same problem. But I discovered an interesting solution:

Some images where referenced as inline-images (image:xyz.png[]) instead of a block-image (image::xyz.png[]).

So, I replaced all those image-references with only one : with block-images (two ::). And now it works for me.

phreed commented 5 months ago

asciidoctorj-pdf is just a repackaging of https://github.com/asciidoctor/asciidoctor-pdf so that it can be consumed with asciidoctorj. The project contains nothing else but that repackaging. If there are better ways now to handle PNGs in Ruby and Prawn it might make more sense to open a ticket for asciidoctor-pdf.

Did this issue get reported to asciidoctor-pdf?