PreTeXtBook / pretext-cli

Command line interface for quickly creating, authoring, and building PreTeXt documents.
https://pretextbook.org
GNU General Public License v3.0
18 stars 19 forks source link

Caching of assets? #707

Open siefkenj opened 7 months ago

siefkenj commented 7 months ago

Currently there is some support for rebuilding assets only if they've changed, but it seems to rely on document structure. Since assets are extracted and them compiled in isolation, I imagine if you stored <md5sum>.svg files in some .cache folder, you could just detect if the asset contents was the same and copy over the cached version instead of running compile again. This method would not rely on document structure at all.

StevenClontz commented 7 months ago

+1

So we have an element like <latex-image xml:id="bar">FOO</latex-image>, we checksum FOO to abc123, then save the result to .cache/latex-image/abc123.svg as well as generated-assets/latex-image/bar.svg. Then on future builds, we simply copy .cache/latex-image/abc123.svg to generated-assets/latex-image/bar.svg (or wherever it should be, in case the filename changes.

rbeezer commented 7 months ago

+1

On March 19, 2024 10:13:06 AM PDT, Steven Clontz @.***> wrote:

+1

So we have an element like <latex-image xml:id="bar">FOO</latex-image>, we checksum FOO to abc123, then save the result to .cache/latex-image/abc123.svg as well as generated-assets/latex-image/bar.svg. Then on future builds, we simply copy .cache/latex-image/abc123.svg to generated-assets/latex-image/bar.svg (or wherever it should be, in case the filename changes.

oscarlevin commented 7 months ago

I'm not sure I understand what issue this resolves. Currently, If you have an asset with xml:id="bar" (or if bar is the id of the youngest ancestor of the asset that has an xml:id), then we store the hash of the asset with the xml:id. If the author changes the asset, then the hashes won't match, so we ask for the asset to be regenerated (and put into the generated-assets).

With this proposal, we keep a copy of the generated asset in .cache. If the author changes the asset, the hash will no longer match, so we regenerate the asset (an put it in .cache and generated-assets).

In both cases, if the asset isn't changed, nothing gets regenerated.

Last case: the asset isn't changed, but the xml:id is changed. Now, the asset is regenerated. Under the proposal, the asset isn't regenerated, but a new copy is made with the new name. I see there is an advantage here, but the disadvantage is keeping every version of the generated asset in the cache and copying over every asset from the cache to generated-assets.

What am I missing?

StevenClontz commented 7 months ago

Another potential use-case: user has <latex-image xml:id="foo">BAR</latex-image> and later <latex-image xml:id="baz">BAR</latex-image>. Maybe it's an anti-pattern that should have been solved with an xref but this would avoid building the same image twice.

siefkenj commented 7 months ago

This would also mean images are cached without assigning an ID to them.

StevenClontz commented 4 months ago

I'm waiting on https://github.com/TeamBasedInquiryLearning/precalculus/actions/runs/9538778663 and I'm seeing a lot of duplication of assets being generated. This could probably be avoided through cleverer configuration of the action, but I still think having a .generated-cache directory that contains a bunch of ELEMENT/FORMAT/HASH.FMT files that is checked before every build and copied over (barring some kind of --force-regenerate) would be excellent.

Another use case: I change my sageplot from blue to green, then hate it, then change it back to blue. The old blue version is still cached so I get it immediately.

oscarlevin commented 4 months ago

I am coming around to really liking this idea. I think this would be handled by core though, correct? So definitely something we will want to collaborate on.

StevenClontz commented 4 months ago

I think this would be handled by core though, correct?

💯 - and this is a good week to do it

StevenClontz commented 4 months ago

Caching should be used in tandem with https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows to speed up CI/CD for PreTeXt projects

StevenClontz commented 4 months ago

(meanwhile: https://github.com/TeamBasedInquiryLearning/precalculus/actions/runs/9569658606/job/26382647393 💀)