asciidoctor / asciidoctor-pdf

:page_with_curl: Asciidoctor PDF: A native PDF converter for AsciiDoc based on Asciidoctor and Prawn, written entirely in Ruby.
https://docs.asciidoctor.org/pdf-converter/latest/
MIT License
1.14k stars 500 forks source link

Overflow error with large SVG embedded data. #2467

Closed MikArber closed 8 months ago

MikArber commented 8 months ago

Hi guys,

First thanks for your work, asciidoctor is a great product, I am very pleased with it and I do promote it around me a lot.

About this issue, I am currently using asciidoctor to generate my technical documents in PDF format.

For illustration purpose inside documents, I am also using drawio software, I use it to create rich design diagrams in SVG format. They are mainly made of boxes and arrows. The combo asciidoc and drawio is a very good one, it is great to manage documents within git project and it is great to be able to handle complex diagram and to zoom in for details.

Last days, as my diagrams get richer and richer, I have reached a limit. It is not directly on the SVG file size but on the embedded metadata within the SVG. Theses ones are used by drawio to store its own format within the svg. From the error, it seems that a XML parser get an overflow: [INFO] asciidoctor: WARN: could not embed image: C:/repo/VariousTest/asciidoc/asciidoctor-pdf-example/src/docs/asciidoc/images/test_272_ko.drawio.svg; entity expansion has grown too large

In order to reproduce the problem, as an attach file, you can find one of your sample with 3 different SVG files:

In my opinion, drawio and the usage of embedded metadata inside SVG is more and more popular and this correction is worthy. Thanks in advance for your reply.

asciidoctor-pdf-example.zip

mojavelinux commented 8 months ago

This limit is outside the control of Asciidoctor PDF. In fact, it's outside the control of prawn-svg too. It's a memory limit set by the rexml library in Ruby itself.

The limit is 10,240 (I'm assuming bytes) by default. You can increase it by requiring the following Ruby script:

require 'rexml'

REXML::Security.entity_expansion_text_limit = 100_000

See https://www.rubydoc.info/stdlib/rexml/REXML/Security

You'll need to pass the location of this script to the requires option of Asciidoctor or AsciidoctorJ. See requires on the following page when using the Maven plugin: https://docs.asciidoctor.org/maven-tools/latest/plugin/goals/process-asciidoc/

mojavelinux commented 8 months ago

the embedded metadata within the SVG.

The main issue seems to be that the content attribute on the root <svg> tag contains encoded content of an entirely other XML document. Another way to deal with this would be to use a tool that removes that attribute before passing it to Asciidoctor PDF. Personally, I consider that attribute to be an abuse of SVG by draw.io. Even with the increased memory, that's going to slow down processing. I would complain to draw.io to stop this practice (instead, they could use CDATA to embed the document, which would be a better practice and less likely to cause a read error).

MikArber commented 8 months ago

Hi Mojavelinux, It worked with the provided Ruby script. I also agree about the CDATA solution. Thanks.