asciidoctor / asciidoctor-pdf

:page_with_curl: Asciidoctor PDF: A native PDF converter for AsciiDoc based on Asciidoctor and Prawn, written entirely in Ruby.
https://docs.asciidoctor.org/pdf-converter/latest/
MIT License
1.13k stars 499 forks source link

Some kroki attributes seems to be ignored #2438

Closed cykl closed 11 months ago

cykl commented 12 months ago

This a follow-up of https://antora.zulipchat.com/#narrow/stream/282400-users/topic/Is.20pdf-extension.20compatible.20with.20asciidoctor-kroki.3F/near/367578627

I have an antora site that use Kroki extensively and want to produce a PDF. It almost works but it seems that kroki-http-method and kroki-max-uri-length aren't taken into account.

I have large Excalidraw diagrams that cannot be rendered because Kroki use GET rather than POST and server returns 414 Request-URI Too Large. I usually set kroki-max-uri-length to 1000 which works in all our others tools, and also tried to set http method to post rather than adaptative.

I run my own Kroki server, and I'm able to observe that kroki-server-url is taken into account.

I don't have a self-contained reproducer yet but here are some of my observations:

$ asciidoctor-pdf --version
Asciidoctor PDF 2.3.9 using Asciidoctor 2.0.20 [https://asciidoctor.org]
Runtime Environment (ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux-musl]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8)

$ cat test.adoc 
= Title 

.Architecture Diagram
[excalidraw , format=svg]
----
include::test.excalidraw[]

$ /usr/bin/ruby /usr/bin/asciidoctor-pdf -r asciidoctor-kroki -a env=site -a env-site -a site-gen=antora -a site-gen-antora -a attribute-missing=skip -a !data-uri -a icons=font -a sectanchors -a source-highlighter=rouge -a site-title="Site Title" -a site-url=https://[redacted].io/handbooks -a page-pagination -a !kroki-fetch-diagram -a kroki-http-method=post -a kroki-max-uri
-length=1000 -a kroki-server-url=https://[redacted].io/ -a allow-uri-read -a revdate=2023-07-11 -a !page-partial -a doctype=book -a docfile=latest@internal::pdf$developers-handbook.adoc -a docfilesuffix=.adoc -a docname@=developers-handbook -a imagesdir=/repo/build/assembler -o /repo/build/assembler/internal/latest/developers-handbook.pdf test.adoc 

asciidoctor: WARNING: could not retrieve remote image: https://[redacted].io/excalidraw/svg/...; 414 Request-URI Too Large

Do you have any idea of what could be wrong?

I give a quick look at https://github.com/ggrossetie/asciidoctor-kroki/blob/master/ruby/lib/asciidoctor/extensions/asciidoctor_kroki/extension.rb#L255 and it seems that the Ruby version supports the same attributes than the JS one. So I'm unsure if the issue lies in asciidoctor-pdf or asciidoctor-kroki.

mojavelinux commented 12 months ago

Asciidoctor PDF uses OpenURI.open_uri (from https://github.com/ruby/open-uri) to read the contents of remote resources, such as a remote SVG. That module relies on the fact that a) the server will respond to a GET request and that b) it can handle the URL length that is sent to it. If the negotiation fails, there is nothing Asciidoctor PDF can do about that. It could be a hard limit somewhere in OpenURI. Without access to the URL, I have no way of testing that theory.

It's possible to replace the code that retrieves the remote resource in Asciidoctor PDF. See https://github.com/asciidoctor/asciidoctor-pdf/blob/main/lib/asciidoctor/pdf/converter.rb#L5068-L5074 You can replace this method by extending an replacing the converter. If you discover a way to make the request such that it succeeds, then we might consider wrapping OpenURI.open_uri in Asciidoctor PDF with a different implementation. However, keep in mind that Asciidoctor core also uses this method, so it has broader application than just the PDF converter.

As a word of advice, I don't think the diagram web interface in Kroki should be relying on such long URLs. It should probably be handle the negotiation itself, cache the image locally, then reference that cached image in the converted document. It should not be up to the converter to make such large requests during conversion to PDF or when the HTML file is viewed.

mojavelinux commented 12 months ago

As for the topic of this issue:

Some kroki attributes seems to be ignored

There is no way that is true. Asciidoctor PDF is merely requesting the URL from the server in order to embed the image. It has 0 knowledge of Kroki, how it works, or its attributes.

cykl commented 12 months ago

@mojavelinux Thanks for feedback!

I confirm that I don't know anything about the overall architecture or interactions between antora, asciidoctor, asciidoctor-pdf and the kroki extension. I'm just a normal user trying to figure out why something doesn't work as expected and contribute where I can.

Regarding my description of the issue, it could indeed make no sense at implementation level. From a user point of view, asciidoctor-kroki documents some configuration keys, https://github.com/ggrossetie/asciidoctor-kroki#configuration. I can properly configure Kroki client to use post rather than get in all others tools. But asciidoctor-pdf doesn't seem to allow me to do it (or I just don't understand how to do it).

Without access to the URL, I have no way of testing that theory.

You can use attached diagram and the public kroki instance. It also returns 414 when URI length is greater than 4096. You can test that using https://kroki.io/#try.

test.excalidraw.txt

It's possible to replace the code that retrieves the remote resource in Asciidoctor PDF. See https://github.com/asciidoctor/asciidoctor-pdf/blob/main/lib/asciidoctor/pdf/converter.rb#L5068-L5074 You can replace this method by extending an replacing the converter. [...]

I'm not sure to get it yet. But I will eventually figure it out :)

mojavelinux commented 12 months ago

I understand you're a normal user just trying to use the software. That said, I'm being clear that you may be reporting the issue in the wrong repository. In this repository, I can only control what Asciidoctor PDF does. If the server is not allowing the remote resource to be downloaded, there's not much that can be done about it in this converter. I have provided ways in which you can experiment with the converter to try alternate approaches to downloading it. But, ultimately, this is a Kroki issue. So I would definitely make sure that the issue has the attention of that project.

I will try the attached diagram to see if I can confirm that this is a limitation in the OpenURI library in Ruby.

mojavelinux commented 12 months ago

I think I may be able to explain what's happening. When using Antora, the Kroki extension fetches the diagrams itself and stores them in Antora's content catalog (see https://github.com/ggrossetie/asciidoctor-kroki#antora-integration). It likely does this using a POST request. When using Asciidoctor standalone, it does not do this by default. Instead, it makes a image URL with all the data in the query string. The server will not respond when this is done. So it's not a limitation of Ruby, but rather a difference in how Kroki is retrieving the diagram. I think you need to set kroki-fetch-diagram.