brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
932 stars 99 forks source link

Publishing Docker image to Docker Hub #1178

Open kwkbtr opened 5 years ago

kwkbtr commented 5 years ago

Hi, do you have any plan for publishing a Docker image built with release/Dockerfile to Docker Hub? There may be a problem mentioned in https://github.com/brucemiller/LaTeXML/pull/1008#issuecomment-399694452

Sadly DockerHub (for automated builds that @bfirsh suggested above) does not seem to support build arguments, so it is non-trivial to automate builds for both variants of the image.

but having at least an image with default build arguments (ARG WITH_TEXLIVE="yes") would be a great help for us.

dginev commented 5 years ago

Hi @kwkbtr . Currently we have @tkw1536 managing latexml's dockerization, so he could provide more context / brainstorm useful new images.

Have you seen the ones already created by Tom at: https://hub.docker.com/r/latexml/latexml-test-runtime/

which we use for Travis?

kwkbtr commented 5 years ago

Thank you for your suggestion. I noticed those images but did not look into them closely since I was not sure they are suitable for usual use cases, not for testing. I will give them a try.

kwkbtr commented 5 years ago

I had a look at latexml/latexml-test-runtime and noticed that tags of the images do not include version number of LaTeXML. It would be great if a specific LaTeXML version can be specified via image tag.

tkw1536 commented 5 years ago

I have considered publishing images to dockerhub, however using dockerhub auto-builds is difficult, because the Dockerfile is in the release subfolder whereas the build context needed is in the repository root. This is also seen in the Dockerfile itself:

# This Dockerfile expects the root directory of LaTeXML as a build context. 
# To achieve this run the following command from the root directory:
#
# > docker build -t latexml -f release/Dockerfile .

I can imagine three solutions to this:

dginev commented 5 years ago

On the high level, I think we need a general approach similar to having a mini-team of maintainers that manage the Debian and Fedora package for latexml. I think Bruce only manages the macports route.

Having an up-to-date and functional collection of docker images strikes me as a similar maintenance burden. We would likely need a volunteer to at least prepare images for the named releases.

Also linking to the current hits for latexml on dockerhub, maybe we could recruit one of their authors as a volunteer, e.g. @physikerwelt ?

https://hub.docker.com/search?q=latexml&type=image

tkw1536 commented 5 years ago

I'll happily volunteer as maintainer of the DockerHub images, if we can figure out:

dginev commented 5 years ago

Awesome, thanks Tom!

Personally I see a point for having release-based docker images (e.g. we can make one for each of 0.8.2, 0.8.3, 0.8.4 and then continue at each release point), as well as a single image that tracks master -- which is the bit that would have to be done automatically through Travis. That setup should take care of all reasonable use cases. Curious to hear if that would work for @kwkbtr as well?

bfirsh commented 5 years ago

:+1: That's what we do for engrafo. Git tags turn into image tags for releases, and latest tracks master. We also push sha hash images for every commit, for the hell of it.

It's built on Travis so we can speed up builds by pulling and using --cache-from. That might be unnecessary for LaTeXML, so building on Docker Hub would work fine if you don't care about build speed.

https://github.com/arxiv-vanity/engrafo/blob/master/.travis.yml https://github.com/arxiv-vanity/engrafo/blob/master/script/ci-deploy-master https://github.com/arxiv-vanity/engrafo/blob/master/script/ci-deploy-tag

kwkbtr commented 5 years ago

That setup should take care of all reasonable use cases. Curious to hear if that would work for @kwkbtr as well?

Yes, that should work great for my current use case. Thank you all for your consideration! 👍

dginev commented 5 years ago

Thanks @bfirsh , that's quite helpful!

tkw1536 commented 5 years ago

I've made a PR that adds support for DockerHub auto builds: #1181

dginev commented 3 years ago

In the absence of official resolution for maintaining a docker image (I think it is not on anyone's critical path?), I ended up sidelining this issue and creating a new Dockerfile for a multi-threaded harness project that converts large collections of mathematical formulas -- which is a typical use of latexml for the Math Information Retrieval community (e.g. ARQMath is using latexml in 2020-2021).

"sidelining" in the sense that I couldn't do a

FROM latexml:latest

to base my image on. So instead I based it on the latest rust image (the proglang for the harness), and did the entire latexml installation dance through apt and cpanminus. Linking the Dockerfile here for reference, note that this is still experimental: https://github.com/dginev/latexml-runner/blob/main/Dockerfile

Would be nice to circle back and tidy up the Docker toolchain pieces... so, bump ?

tkw1536 commented 3 years ago

From my end the Dockerfile in this repository still works. The only thing outstanding is that it should be published on some registry (e.g. DockerHub, GitHub Package Registry).

dginev commented 3 years ago

I'm bumping the milestone again, since it's hard for us to get into the right mindset to organize and actively maintain these. It's a bit of a paradox that while everyone wants to have an official and properly updating "dockerized latexml" available, no one has the right motivation to actually execute on that.

The latexml dockerhub namespace lacks people who actively use a dockerized vanilla latexml, so it's almost like we're squatting on that namespace handle. Tom has been great in updating the CI images regularly, but he doesn't do actual latexml-at-scale conversions, so it's a different focus. Meanwhile, I do latexml-at-scale conversions, but with my own home-baked docker image that does a lot more than a vanilla latexml image would. So the whole thing is a bit sideways... We ought to straighten it out.

dginev commented 2 years ago

While/Since we still don't have a resolution on how to maintain an official latexml image, I have published another unofficial one today, again installing latexml from scratch (in one of the many possible ways, this time using cpanminus, following the LaTeXML-Plugin-Cortex Dockerfile).

It is available under latexml/ar5ivist on Dockerhub, and the respective repository here. As the name suggests, it is a turnkey one-liner for conversions using the exact configuration for ar5iv.

tkw1536 commented 2 years ago

I think we should use this to restart the discussion of having an official docker image or not.

castedo commented 7 months ago

@dginev FYI, I'm experimenting with an automatically built and publicly available OCI (docker) image with latexml over on gitlab. I'm planning to put it over at https://gitlab.com/perm.pub/dock. Feel free to shoot me questions and requests for better documentation, how I build it and why, etc... Enjoy!

dginev commented 7 months ago

@castedo thank you for the heads up!

You are most welcome to edit your comment above and describe the full details of your use case, both in executing latexml, and in the way you've decided to package and publish that setup. I think it can be informative for everyone tracking this issue to know of such recent developments.

castedo commented 7 months ago

Here's the dual Git & OCI container image registry which currently has LaTeXML 0.8.8, with some documentation on how to run it: https://gitlab.com/perm.pub/dock/latexml-deb

For more details and documentation on how container image gets automatically built and deployed checkout: https://gitlab.com/perm.pub/dock/

I'm using it to investigate what kind of JATS XML gets output by latexml+latexmlpost.

physikerwelt commented 7 months ago

I have been using this docker file in production for several years. https://hub.docker.com/r/physikerwelt/latexml/tags It is a bit memory-hungry unless you restrict it.