SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
486 stars 186 forks source link

Feature Request: add a staleness check for cached container Images #2846

Open tabedzki opened 3 months ago

tabedzki commented 3 months ago

Is your feature request related to a problem? Please describe. When using run_sorter, users can combine docker/singularity_image=True with delete_container_files=False in their scripts. Assuming a users implements a "set it and forget it` approach, it is conceivable that the eventually the user will not be running on the latest version of the container for whichever package they are using.

Describe the solution you'd like An ability to update (or auto-update) the container used, checking for if a container is old and updating if necessary. Maybe a flag like auto_update_container_image

Describe alternatives you've considered I've considered working with IT to discuss how to manage the images and whether or not the users should be manually specifying the container images.

JoeZiminski commented 3 months ago

This is a nice idea, I get a little confused about how the docker images are generated. As an aside on this, it would be great to have a page that fully documents the general build-and-deploy pipeline for the docker images, I sometimes get a bit stuck understanding from the repo or documentation (please let me know if I've missed any existing docs).

My understanding is that when the images are built they have a version tag e.g. here. When SI pulls a docker image it will pull the latest version and install on your machine. If you don't do anything else, it will continue to use this pulled image forever and you will miss updates. It doesn't look like the image .sif is deleted by delete_container_files, so I think(?) this argument will have no effect on this process.

As far as I can tell on run_sorter there is no option to specify the tag for the image. It would be nice to expose this with some convenience options e.g. None will just use the current spikeinterface default behaviour. "always_latest" could make sure your downloaded image is the latest version. You could also specify specific versions to use.

In general I think the docker versioning is a hard problem to get right because it can be very hairy for users. For example you run all your analysis with SI v100.0. Then 6 months later you come back to do some additional analysis for paper review. You again install SI v100.0 but now the image downloaded is for a newer version of the KS image that gives you different results. This is very hard to track down. It would be nice to pin specific versions of SI to specific version of images, but this is difficult and also confusing in a different way. Alternatively (if not already) this could be made extremely clear in the logs e.g. a big banner that says the docker image version that's been used, it's release date, and how to request a different version.

tabedzki commented 3 months ago

Thanks for highlighting the fact that the final container image is not deleted when using delete_container_files.

As far as I can tell on run_sorter there is no option to specify the tag for the image. It would be nice to expose this with some convenience options e.g. None will just use the current spikeinterface default behaviour. "always_latest" could make sure your downloaded image is the latest version. You could also specify specific versions to use.

It appears that one is able to specify the tag explicitly. See here. As for having an "always_latest" I think that would be useful.

I agree with the idea of having the docker image version outputted somewhere in the logs for future reference. I didn't see that implemented when looking through my logs but I might've missed it.

JoeZiminski commented 3 months ago

Cool thanks @tabedzki I did not know about that ability to request the exact version of the image, that is nice.

As a quick aside, I wonder if run_container_local() and run_sorter_container are intended for public use, if not these could be prefixed with a _.

There are a couple of ways this could be implemented. One is to keep everything as-is but add some new keywords for docker_image and singularity_image such as "always_latest". However these arguments are already quite overloaded (e.g. can be bool for download, str to an existing local sorter, str of the docker image with tag to download).

Alternatively, there could be a new argument image_version and the ability to select a specific container with docker_image or singularity_image removed. These would now only accept bool or str to an existing container. The image_version would download the requested image (if downloading the image), and accept keyword arguments like always_latest. If the path pointed to previously downloaded image but it did not match image_version (e.g. it was not the latest image) it would crash. This could use the 'digest' rather than the tag (because 'latest' tag would not work for this purpose).

Finally adding some clearler logs to indicate exactly what image version is used. I don't know if there is any appetite for pinning SI versions to docker image versions. I guess each release would by default have to point to a specific docker image release rather than 'latest'. Each time a new image is released the corresponding version of SI would need to be updated. However, this would have to be handled separately for very image, and would be error prone. I think logs instead are the way to go.

alejoe91 commented 3 months ago

We should definitely at least add the image hash to the spike sorting log/params ASAP! Great point!

For usage, I would tend to aboid additional complexities to the run_sorter function, which is already quite complicated.

IMO, if one uses docker_image=True, it means that the "latest" should be used. Alternatively, we can simplify this argument so that it only accepts str | None, so docker_image=True will become `docker_image="latest``.

The code should check that if a latest tag is already cached locally, it corresponds to the remote tag. If not, download the latest.

tabedzki commented 3 months ago

@alejoe91, if we do update the options/documentation, I think we should update the singularity_image documentation to highlight that one can provide a path to the image via str since I believe that is something that Apptainer/Singularity provides. Based off reading the documentation, I didn't realize that this was possible; it was something that @JoeZiminski pointed out to me.