asciidoctor / asciidoctorj

:coffee: Java bindings for Asciidoctor. Asciidoctor on the JVM!
http://asciidoctor.org
Apache License 2.0
624 stars 173 forks source link

Asset catalog does not include inlined images #1166

Closed adrpar closed 1 year ago

adrpar commented 1 year ago

First of all, thanks for Asciidoctor and its Java bindings!

I am struggling with using the AST, as I would like to obtain all references to images in my document. I have understood that findBy is not able to find any inlined images, however from the issues and discussions around the Asset Catalog I got to understand (probably wrongly) that it would contain also inlined images when processing the source with the appropriate option catalogAssets set to true.

Is it possible using AsciidoctorJ to obtain the inlined images or do I need to process the individual blocks myself?

Thank you for your help.

abelsromero commented 1 year ago

2 things:

  1. AsciidoctorJ does not expose catalog images now, but that can be added. The structures are there, we just need to add the method.
  2. Inline images don't seem to be added to the catalog. The documentation mentions that's possible but I could not get them, from AsciidoctorJ or Asciidoctor Ruby. Exploring the AST, block images have their own Block node, but inline images are embedded inside the paragraph node (no sub-nodes added for the image itself).

@mojavelinux sorry to tag you. Is the assumption correct or am I doing something wrong? Here's my test with asciidoctor gem v2.0.18.

require 'asciidoctor'

input_file = 'sample.adoc'
doc = Asciidoctor.load_file(input_file, catalog_assets: true)

images = doc.catalog[:images]
puts "Found #{images.size} images:"
puts images
= This is a title
Author Name
:icons: font

This a preamble

== This is a section

These are some replacements {value}

== Some admonitions

NOTE: Asciidoctor supports font-based admonition icons, powered by Font Awesome!

== A few images

A block image

image::images/sunset.jpg[]

An inlined image image:images/AsciiDoc-R-logo-color.png[] in the text.
mojavelinux commented 1 year ago

Inline images are cataloged (when catalog assets is enabled), but only while the document is converted to the output format. Thus, they won't be available until after the document is converted. That's because the load method only parses block-level elements. All inline elements are parsed during conversion. (This is a well-known shortcoming of Asciidoctor and one we are working to address in the language project).

adrpar commented 1 year ago

Thank you very much for your responses. Indeed, calling getContent on the document before accessing the asset catalog does fill it up with all images.

Two thoughts on this where you could maybe help future users of the bindings: 1) Document this behaviour clearly in the catalogAssets section of https://docs.asciidoctor.org/maven-tools/latest/plugin/goals/process-asciidoc/#configuration

2) or maybe improve the behaviour of the bindings to be more intuitive: If catalogAssets is set to true, already convert the document once on load, so that all the information is there as one would at least naively expect.

abelsromero commented 1 year ago

Inline images are cataloged (when catalog assets is enabled), but only while the document is converted to the output format

Now I understand why the docs use convert_file in the example :+1: And using convert it works perfectly. In all fairness, the docs are well explained, I read diagonally, my fault.

Generally speaking, inline elements are not cataloged until conversion. Therefore, they’re only available after the document has been converted, not after it has been loaded.

Two thoughts on this where you could maybe help future users of the bindings:

Everything makes sense from Asciidoctor (the Ruby core) point of view. The issue I see here is that AsciidoctorJ has two methods to do the same and they are not working (for me at least) and we have no tests :disappointed:

var options = Options.builder()
            .safe(SafeMode.SAFE)
            .catalogAssets(true)
            .build();
var doc = asciidoctor.convert(Files.readString(sourceDocument.toPath()), options, Document.class);

Returns an internal Ruby String image

And this which I think should be good one returns null.

var document = asciidoctor.convertFile(sourceDocument, options, Document.class);

Once this is fixed, we can start thinking about how to better integrate with the maven plugin. @adrpar I presume you are using an extension? Do you mind opening an issue in github.com/asciidoctor/asciidoctor-maven-plugin/ with the specific use case and an example?

adrpar commented 1 year ago

I was not aware of the maven plugin and I need to check if it suites my needs.

My use case is in general to implement a documentation workflow that is similar to what the docToolchain (https://doctoolchain.org) provides but more extensible and tailored to uploading documentation to Confluence. My idea was to create a set of command line tools that can be put together into a processing pipeline that will run within a CI/CD system.

Specifically the use case for getting all the images was with copying the sources of a documentation into a "build" directory, so that all images are in one place. Thus collecting all images in one images directory. The second step would then be to (if needed) update all the image references within the asciidoc source to point to that one directory... Which I know is going quite beyond what AsciidoctorJ provides.

I hope that somewhat explained my use case.

mojavelinux commented 1 year ago

or maybe improve the behaviour of the bindings to be more intuitive: If catalogAssets is set to true, already convert the document once on load, so that all the information is there as one would at least naively expect.

This would deviate from the behavior of Asciidoctor core and cause side effects, not something AsciidoctorJ should be doing. Right now, the contract in Asciidoctor is that the inline images become available after conversion. Thus, if you need to access them, then the code that calls Asciidoctor needs to use convert or convertFile.

(This may change in the future major release once we solve the inline parsing issue, which we are working on. But it's not the way it is now).