brainvisa / casa-distro

Unified development environment for BrainVISA projects.
2 stars 1 forks source link

How do we keep track of the casa-run/casa-dev images that are used for releases? #238

Closed ylep closed 3 years ago

ylep commented 3 years ago

The casa-run and casa-dev images are not versioned, so we do not currently have a good way of knowing exactly the images that were used to make a release.

For example, when we make BrainVISA 5.0.1 to fix some important bug, we will want to use the same images that were used for 5.0.0, in order to avoid introducing new bugs.

We could take several steps to fix that:

denisri commented 3 years ago

This is rather an issue for casa-distro, not really brainvisa-cmake ;)

denisri commented 3 years ago

This is related to #183

sapetnioc commented 3 years ago

There is a beginning of image provenance tracking in metadata. On system image, there is an origin entry in metadata that contains the source image used to build it (either a .sif for Singularity or .iso for VBox). This entry is copied untouched in run and dev images and lost in user image. This is broken and incomplete.

I think that we need, at least, the parent image for base images (like origin for system image).

For user image, I agree that it is important to know the dev image used. For instance to allow toolbox developers to select a matching image for a specific BrainVISA release. But I do not think it is a big deal if image is modified between 5.0.0 and 5.0.1. We cannot stop system evolution nor go back (possible but too complex to maintain). And we will not make disruptive change to ubuntu-18.04 images. The next disruptive change will be for ubuntu-20.04. Then, if we introduce a big change in the same system, we will have to create a new "image branch". For instance ubuntu-20.04-qt6.

denisri commented 3 years ago

There may be a few "disruptive changes" in dev images not completely in our control: pip installs... Pip installs systematically the latest version of things without questioning for compativility issues. We tried to fix a number of packages versions but as many dependencies are installed under the hood without us even knowing about them, we cannot exclude the possibility of such an incompatible update. Another situation is when a developer needs a new version of a package because he needs new features, or a fix, in his toolbox. This doesn't happen every day and can be discussed then, but it sometimes happen (we had this situation for scipy on Ubuntu 16.04 in the last release). Thus having a version on dev images, at least those used for releases, makes sense.

sapetnioc commented 3 years ago

Ok, this means that we will have to clearly identify the BBE that are used to create and publish release images and avoid to share their dev images with other BBE used for a different "release branch".

denisri commented 3 years ago

I propose the following:

denisri commented 3 years ago

Image IDs should be stored in images themselves, because during setup we are inside the container, with host filesystem not mounted, and we don't have access to the image file and its .json metatada which currently holds the ID. This means that the ID cannot be the MD5 (we cannot write inside the image a MD5 which cannot be computed before the image is ready). We should rather use an UUID. In the image we cas store it in a file /casa/image_id for instance.

sapetnioc commented 3 years ago

I think using UUID could be too complex to manage compatibility. To identify images, we could use a few explicit (i.e. given at image creation) metadata items :

With this, the ID of an image could be {image_name}-{image_type}-{image_version}-{image_build}. We should keep the metadata of the parent images (including system image).

denisri commented 3 years ago

I was proposing an uuid because, at the contrary, it seemed simpler to me... This can be discussed, of course, but:

But all this does not really change the story...

sapetnioc commented 3 years ago

Yes, image_version for run or dev images is just a convention. But it will be necessary to have something human readable if we have several versions to propose for upload to users (in the download web page).

To limit race condition between image creaters, we could have a mixed solution. At build time, image_build is an UUID as you proposed. Upon upload, image_build become a number (a locking system may still be required to avoid race condition on uploads). I propose to rename the local image (fromm UUID to number), to modify its metadata accordingly and, optionally, to replace the old image name (with UUID) with a symlink to avoid breaking local test environments that already uses the uploaded image.

denisri commented 3 years ago

This seems OK. I'm writing the file locking code for the server (it will require python to be installed on the server but I guess it's not a strong constraint and actually it's obviously already installed). This also answers the question I was about to ask: how do we manage locally (on users side) images with versions/numbers ? Now it's clearer: we download images with a number in the filename, so it may coexist with older ones, and we have a means of cleaning them up by removong images which are not used in any environment.

ylep commented 3 years ago

Upon upload, image_build become a number

@denisri I’d suggest to add the numeric build count in a new metadata field but keep the UUID, instead of replacing the UUID with the build number. Having one field that changes value (and type) seems confusing to me...

Just my 2 cents.

denisri commented 3 years ago

I've not finished this part, but yes, I will not remove the UUID for a build count. Right now the build count is only in the filename and is not used otherwise for anything except finding a new filename not already used for a new upload. We can keep it in a metadata field yes (I don't know yet if it will actually be used but it doesn't matter). This will need to change the metadata when uploading because it's only at this moment that we can assign the build count (which is an upload count actually).

sapetnioc commented 3 years ago

I was thinking to use the build count to sort images and build the download table without presenting UUID (to date, we are not displaying file name). But creation_time can be used instead, you can get rid of build count. If we want to hide UUID, we can present something else (file name, creation time or table line number).

On Thu, Feb 11, 2021 at 10:51 AM Denis Rivière notifications@github.com wrote:

I've not finished this part, but yes, I will not remove the UUID for a build count. Right now the build count is only in the filename and is not used otherwise for anything except finding a new filename not already used for a new upload. We can keep it in a metadata field yes (I don't know yet if it will actually be used but it doesn't matter). This will need to change the metadata when uploading because it's only at this moment that we can assign the build count (which is an upload count actually).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/brainvisa/casa-distro/issues/238#issuecomment-777322491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXLUTXK4UYIOP7EJFXKDPTS6OSCLANCNFSM4XDBTRTQ .

denisri commented 3 years ago

I totally agree we should not display the UUID. We can use the filename or the build count, yes (this is more or less the same now).

sapetnioc commented 3 years ago

Now that base images have a version, I think we can remove the container system from the image name. For instance, casa-run-ubuntu-18.04-1.0-1.sif would become casa-run-1.0-1.sif.

denisri commented 3 years ago

OK. Developers may be interested in knowing which system they are developing on but this is in the metadata, and can be displayed in the download page (so that they know before downloading an image).

denisri commented 3 years ago

There will be an incompatibility in casa-distro when we apply this latest change, because images names, filenames, and selection criterions will change.

sapetnioc commented 3 years ago

I think that, in casa-distro 3, there is no more image selection on pattern for users/developers. They setup their environments by first downloading image and then the image file name is explicitely stored on casa_distro.json.

However, there is still an an automatic image selection for images creation or publication. These will have to be changed and local image name will have to be updated. This was alerady the case for create_user_image that I had to correct this morning because of version in run image file name. But the change can be used immediately after sources are updated.

Finally, we would have to update published image names or to rebuild and publish images.

denisri commented 3 years ago

There is still image selection for developers in the casa_distro command (which still exists !), in pull_image, list, list_images, run, mrun, setup_dev, or a few more commands. They are still useful to update developers environments and manage multiple environments. Not speaking about casa_distro_admin. I'm updating them.

denisri commented 3 years ago

I have pushed modifs (571e4eb645c79c60b23d7ba64294df0c7e680740 - 11460ef2603b4d734dfa931932281c3ee8523206) let's see if it works...

denisri commented 3 years ago

I'm closing this issue for now because it's "mainly" addressed. If things are missing we'll reopen or open a new issue later.