Leverage Antora cache to locally cache images

ggrossetie commented 3 years ago

Antora uses a cache directory to avoid fetching remote resources:

On subsequent runs, Antora will attempt to resolve remote resources from the cache instead. https://docs.antora.org/antora/2.2/playbook/configure-runtime/#cache

Currently, the "cache" does not work with Antora because the output directory is deleted between runs. What we should do:

Resolve the cache directory from Antora
Create a directory in the cache directory named kroki (for instance, if the cache directory is ./.cache/antora then we should create ./.cache/antora/kroki)
When fetching images from the Kroki server, write them in the cache directory
Copy them to the output directory

When running again we should check if the image exists in the cache directory, if that's the case we should copy the image into the output directory, if not we should fetch it.

Open questions

Should we include the diagram library version in the hash? If yes, how do we retrieve the current version of the diagram library from the server?
Can we fetch the images again when the Antora --fetch option is used? If not, how can we remove the images?

aisbergde commented 3 years ago

This is exactly, what I am looking for!

marcelstoer commented 3 years ago

Yes! This would be tremendously helpful in reducing build time.

Currently, the "cache" does not work with Antora

Shouldn't that read "does not work with/for Kroki" (or I don't get the sentence)?

As for your open questions

Can we fetch the images again when the Antora --fetch option is used? If not, how can we remove the images?

I would prefer not to piggy-back on the Antora flag but create something like --kroki-fetch instead. Some users may need that flexibility.

Should we include the diagram library version in the hash?

Disclaimer: I probably don't know Kroki well enough to have a qualified opinion on this. However, wouldn't the fetch option be enough for system maintainers or doc managers to manually "uncache" diagrams when the diagram lib versions change? Ideally they will want to have control over those anyway, no?

ggrossetie commented 3 years ago

Currently, the "cache" does not work with Antora Shouldn't that read "does not work with/for Kroki" (or I don't get the sentence)?

As far as I remember, this information is not available in the Antora context. Meaning that extensions cannot retrieve the location of the cache directory used by Antora.

We could use our own cache directory but I feel like it would be better if everything was inside the same cache directory.

I would prefer not to piggy-back on the Antora flag but create something like --kroki-fetch instead. Some users may need that flexibility.

We cannot do that because extensions cannot add additional command line options. In other words --kroki-fetch won't be recognized by Antora CLI. As far as I know, the only solution is to use an environment variable: KROKI_FETCH=true antora antora-playbook.yml

Disclaimer: I probably don't know Kroki well enough to have a qualified opinion on this. However, wouldn't the fetch option be enough for system maintainers or doc managers to manually "uncache" diagrams when the diagram lib versions change? Ideally they will want to have control over those anyway, no?

I guess, that's a good point!

To clarify, the current blocker (for me) is that the location of the cache directory used by Antora is not available on the context.

@mojavelinux do you think it's bad if we start using our own cache directory?

mojavelinux commented 3 years ago

As far as I remember, this information is not available in the Antora context.

Actually, it is. The cache is stored on the playbook (and defaults to the return value of getCacheDir if not set). You do need to resolve it, but so does every Antora component. See https://gitlab.com/antora/antora/-/blob/master/packages/content-aggregator/lib/aggregate-content.js#L944-957

The cache folder in Antora is designed so that extensions will make use of it. But you should create a folder inside that cache folder to namespace your cache. For example, "kroki".

We cannot do that because extensions cannot add additional command line options. In other words --kroki-fetch won't be recognized by Antora CLI.

We are considering providing specific fetch categories, but that needs to be taken up with Antora.

ggrossetie commented 3 years ago

Actually, it is. The cache is stored on the playbook (and defaults to the return value of getCacheDir if not set). You do need to resolve it, but so does every Antora component. See gitlab.com/antora/antora/-/blob/master/packages/content-aggregator/lib/aggregate-content.js#L944-957

🤯 🤯 🤯 Then, I think we are ready to implement this feature!

The cache folder in Antora is designed so that extensions will make use of it. But you should create a folder inside that cache folder to namespace your cache. For example, "kroki".

Yes, that's what I had in mind 👍🏻

marcelstoer commented 3 years ago

For a dirty personal caching PoC I tweaked fetch#save() (around https://github.com/Mogztter/asciidoctor-kroki/blob/master/src/fetch.js#L46).

It works fine but I noticed unhxr was still issuing GET requests. Debugging revealed they stem from preprocessPlantUML() -> preprocessPlantUmlIncludes() triggered at https://github.com/Mogztter/asciidoctor-kroki/blob/master/src/asciidoctor-kroki.js#L81. So, those would have to be silenced as well if the diagram is already in the cache (or just cache the includes).

mojavelinux commented 3 years ago

@Mogztter FYI I'm strongly considering creating an Antora helper (like the @antora/expand-path-helper) to wrap up the logic of finding and creating the cache folder. That should make it easier in the future. Until then, you'll need to use the code I linked to. You can also find a similar example in the ui-loader.

asciidoctor / asciidoctor-kroki

Leverage Antora cache to locally cache images #113