Closed ylep closed 3 years ago
This is rather an issue for casa-distro, not really brainvisa-cmake ;)
This is related to #183
There is a beginning of image provenance tracking in metadata. On system image, there is an origin
entry in metadata that contains the source image used to build it (either a .sif for Singularity or .iso for VBox). This entry is copied untouched in run and dev images and lost in user image. This is broken and incomplete.
I think that we need, at least, the parent image for base images (like origin
for system image).
For user image, I agree that it is important to know the dev image used. For instance to allow toolbox developers to select a matching image for a specific BrainVISA release. But I do not think it is a big deal if image is modified between 5.0.0 and 5.0.1. We cannot stop system evolution nor go back (possible but too complex to maintain). And we will not make disruptive change to ubuntu-18.04
images. The next disruptive change will be for ubuntu-20.04
. Then, if we introduce a big change in the same system, we will have to create a new "image branch". For instance ubuntu-20.04-qt6
.
There may be a few "disruptive changes" in dev images not completely in our control: pip installs... Pip installs systematically the latest version of things without questioning for compativility issues. We tried to fix a number of packages versions but as many dependencies are installed under the hood without us even knowing about them, we cannot exclude the possibility of such an incompatible update. Another situation is when a developer needs a new version of a package because he needs new features, or a fix, in his toolbox. This doesn't happen every day and can be discussed then, but it sometimes happen (we had this situation for scipy on Ubuntu 16.04 in the last release). Thus having a version on dev images, at least those used for releases, makes sense.
Ok, this means that we will have to clearly identify the BBE that are used to create and publish release images and avoid to share their dev images with other BBE used for a different "release branch".
I propose the following:
Image IDs should be stored in images themselves, because during setup we are inside the container, with host filesystem not mounted, and we don't have access to the image file and its .json metatada which currently holds the ID.
This means that the ID cannot be the MD5 (we cannot write inside the image a MD5 which cannot be computed before the image is ready). We should rather use an UUID.
In the image we cas store it in a file /casa/image_id
for instance.
I think using UUID could be too complex to manage compatibility. To identify images, we could use a few explicit (i.e. given at image creation) metadata items :
casa
(or brainvisa
). system
, run
, dev
or user
.5.0
for next release images. All images dev
and run
images are compatible if and only if they have the same image_version
. I do not really know how to define compatibility with system images. The same system image could be used for different versions of various images.With this, the ID of an image could be {image_name}-{image_type}-{image_version}-{image_build}
. We should keep the metadata of the parent images (including system image).
I was proposing an uuid because, at the contrary, it seemed simpler to me... This can be discussed, of course, but:
image_version
as a branch is not a really well defined notion. For a user image, it is, of course, but for a run or dev image it's not so clearimage_build
is more difficult to manage. Several developers (such as you and me) may build new images approximately at the same time, and we would generate the same build number even if the contents may differ. To be unique this number should use a shared locking / logging system - have a network connection, have write permission to a server to log builds "in progress" which have already reserverd a build number, etc. This is precisely the reason I suggested an UUID. The only drawback for me is the lack of human-friendliness of such uuids. This is also why I suggested to convert these numbers into a build number only at the time of uploading an image to the server, and use it only in the flename. This would be a release (upload) number rather than a build number, actually.But all this does not really change the story...
Yes, image_version
for run or dev images is just a convention. But it will be necessary to have something human readable if we have several versions to propose for upload to users (in the download web page).
To limit race condition between image creaters, we could have a mixed solution. At build time, image_build
is an UUID as you proposed. Upon upload, image_build
become a number (a locking system may still be required to avoid race condition on uploads). I propose to rename the local image (fromm UUID to number), to modify its metadata accordingly and, optionally, to replace the old image name (with UUID) with a symlink to avoid breaking local test environments that already uses the uploaded image.
This seems OK. I'm writing the file locking code for the server (it will require python to be installed on the server but I guess it's not a strong constraint and actually it's obviously already installed). This also answers the question I was about to ask: how do we manage locally (on users side) images with versions/numbers ? Now it's clearer: we download images with a number in the filename, so it may coexist with older ones, and we have a means of cleaning them up by removong images which are not used in any environment.
Upon upload,
image_build
become a number
@denisri I’d suggest to add the numeric build count in a new metadata field but keep the UUID, instead of replacing the UUID with the build number. Having one field that changes value (and type) seems confusing to me...
Just my 2 cents.
I've not finished this part, but yes, I will not remove the UUID for a build count. Right now the build count is only in the filename and is not used otherwise for anything except finding a new filename not already used for a new upload. We can keep it in a metadata field yes (I don't know yet if it will actually be used but it doesn't matter). This will need to change the metadata when uploading because it's only at this moment that we can assign the build count (which is an upload count actually).
I was thinking to use the build count to sort images and build the download
table without presenting UUID (to date, we are not displaying file name).
But creation_time
can be used instead, you can get rid of build count. If
we want to hide UUID, we can present something else (file name, creation
time or table line number).
On Thu, Feb 11, 2021 at 10:51 AM Denis Rivière notifications@github.com wrote:
I've not finished this part, but yes, I will not remove the UUID for a build count. Right now the build count is only in the filename and is not used otherwise for anything except finding a new filename not already used for a new upload. We can keep it in a metadata field yes (I don't know yet if it will actually be used but it doesn't matter). This will need to change the metadata when uploading because it's only at this moment that we can assign the build count (which is an upload count actually).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/brainvisa/casa-distro/issues/238#issuecomment-777322491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXLUTXK4UYIOP7EJFXKDPTS6OSCLANCNFSM4XDBTRTQ .
I totally agree we should not display the UUID. We can use the filename or the build count, yes (this is more or less the same now).
Now that base images have a version, I think we can remove the container system from the image name. For instance, casa-run-ubuntu-18.04-1.0-1.sif
would become casa-run-1.0-1.sif
.
OK. Developers may be interested in knowing which system they are developing on but this is in the metadata, and can be displayed in the download page (so that they know before downloading an image).
There will be an incompatibility in casa-distro when we apply this latest change, because images names, filenames, and selection criterions will change.
I think that, in casa-distro 3, there is no more image selection on pattern
for users/developers. They setup their environments by first downloading
image and then the image file name is explicitely stored on
casa_distro.json
.
However, there is still an an automatic image selection for images creation
or publication. These will have to be changed and local image name will
have to be updated. This was alerady the case for create_user_image
that
I had to correct this morning because of version in run image file name.
But the change can be used immediately after sources are updated.
Finally, we would have to update published image names or to rebuild and publish images.
There is still image selection for developers in the casa_distro
command (which still exists !), in pull_image
, list
, list_images
, run
, mrun
, setup_dev
, or a few more commands. They are still useful to update developers environments and manage multiple environments. Not speaking about casa_distro_admin
. I'm updating them.
I have pushed modifs (571e4eb645c79c60b23d7ba64294df0c7e680740 - 11460ef2603b4d734dfa931932281c3ee8523206) let's see if it works...
I'm closing this issue for now because it's "mainly" addressed. If things are missing we'll reopen or open a new issue later.
The casa-run and casa-dev images are not versioned, so we do not currently have a good way of knowing exactly the images that were used to make a release.
For example, when we make BrainVISA 5.0.1 to fix some important bug, we will want to use the same images that were used for 5.0.0, in order to avoid introducing new bugs.
We could take several steps to fix that:
/neurospin/brainvisa
, and be careful not to update the images