docker / roadmap

Welcome to the Public Roadmap for All Things Docker! We welcome your ideas.
https://github.com/docker/roadmap/projects/1
Creative Commons Zero v1.0 Universal
1.45k stars 246 forks source link

As a researcher, I need to continue to be able to run "finished" archival Docker images published in older manifest versions #662

Open adamnovak opened 4 weeks ago

adamnovak commented 4 weeks ago

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? In the scientific community, we're responsible for archiving the code we used in published research, so that it is usable for what in computing is a very long time.

This code is often "finished": we don't expect to ever write a new version, and any bugs in it are meant to be there forever. (Often we'll write CoolTool v0.1 and v0.2, publish on that, and then move on to CoolToolOnWheels v0.1 which is a rewrite with a slightly different approach, and then we'll expect someone to be able to benchmark against the bug-for-bug original CoolTool v0.2 even though it's now 15 years later and the grad student who wrote it has switched careers.)

The Docker engine, Docker Hub, and the Docker image format are widely used by scientists to solve this reproducibility problem. When you publish a scientific paper, you can publish the software you used to reach your conclusions as a Docker image on Docker hub, with a tag you never change again. This is an easy way to let you allow anyone who wants to check your work or reproduce your conclusions being able to re-run exactly the same software you ran, and is much nicer to work with than, say, a VM image, or taking the whole computer and archiving it in your university's library.

But for this to work, an identifier like research-group/cool-tool:v0.2 (with associated image hash) needs to continue to work indefinitely, without anyone being around to rebuild or re-push the image.

In Docker Engine v26.0, support for pulling images in the old "Docker Image v1" and "v2 schema 1" formats has been turned off. In v27.0, it will be removed entirely. But, a corresponding change has not been made on Docker Hub to serve older images using the new format. Instead, when someone tries to pull an old image, they get a message that they should Suggest the author of $IMAGE to upgrade the image to the OCI Format or Docker Image manifest v2, schema 2..

The suggested solution can't really work in the scientific community, since the author is highly likely to not be available to do this. And the fact that people who published "finished" software in a supposedly-reproducible way several years ago are being called on to do a manual mandatory upgrade suggests that Docker as a system is not actually designing around the software archiving use case.

Tell us about your request

I need a good way to run old images on Docker Hub that never get updated by their authors. Either Docker Engine needs to permanently support the original formats, or there needs to be an easy conversion tool or plugin for Docker Engine to let it support the old formats (docker-import-v1.py research-group/cool-tool:v0.2), or Docker Hub needs to upgrade all the old tags without author intervention.

I also need some notion of how long the Docker ecosystem promises an image I publish today will be accessible for. Is it the 20+ years I require for archiving scientific software, or not?

Which service(s) is this request for? Let us know which product(s) you want this for?

Docker Engine, Docker Hub

Are you currently working around the issue? Our lab's cluster has been downgraded to Docker Engine v23 to avoid this problem for now.

Additional context

Some of these very old Docker images are still in active use; there's not a lot of call to update e.g. the script in a Dockerized workflow that shuffles the lines of an input file.