eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.99k stars 1.19k forks source link

Use digests instead of tags to reference images #16047

Open l0rd opened 4 years ago

l0rd commented 4 years ago

Is your enhancement related to a problem?

Images tags can reference an image today and a different one tomorrow. That makes our plugin and devfile registries builds not repeatable. We have the same kind of problem with the references to git repos. In this case it's for containers.

Using the digests would have the following benefits:

Describe the solution you'd like

Replacing tags with digests in:

ibuziuk commented 4 years ago

@l0rd could you please put the priority and team for this task. I suspect this is P1 since this feature is needed for the product

l0rd commented 4 years ago

@ibuziuk done. By the way this is not needed for the product. It's an improvement that will make Che more reliable. I have also affected the label kind/epic because it will need multiple teams to work on and probably a few iterations.

nickboldt commented 4 years ago

properties files

What properties files do we have in Che that refer to images?

che dockerfiles

Can you give an example of this?

In CRW we only use tags converted to digests in three places:

These are updated at build time from the latest tag via these scripts:

Perhaps the best solution here is to implement the same thing in upstream, so every container build of registries and every csv release uses specific digests, but the code in GH still contains the tag references?

That way you have BOTH the snapshot with the specific, permanent reference to a specific image, but also the code with the moving tag in case you want to rebuild it against a newer nightly/latest.

RickJWagner commented 4 years ago

A user is asking about image upgrades. (i.e. some workspace image gets a CVE upgrade.) To make use of the improved image:

A) The user should upgrade the entire product, via che(crw)ctl server:upgrade B) The user should manually hack plugin, devfile, operator metadata C) We will provide some documented way of doing this D) Other

(Note: Docker build processes above don't seem workable on RHEL 7)

@nickboldt @l0rd

l0rd commented 4 years ago

@nickboldt 4 images are specified in che.properties, every Dockerfile in our repos have a FROM <base-image> where the <base-image> should be specified using a digest.

Perhaps the best solution here is to implement the same thing in upstream, so every container build of registries and every csv release uses specific digests, but the code in GH still contains the tag references?

That approach doesn't solve the main problem here: we want repeatable builds when we do a release. When our source code doesn't change we want the result of the build to be identical. The example of issue that we want to fix is when che.openshift.io got broken after an update because we were referring quarkus nightly image as nightly and that image had "silently" changed.

That way you have BOTH the snapshot with the specific, permanent reference to a specific image, but also the code with the moving tag in case you want to rebuild it against a newer nightly/latest.

That's exactly what we want to avoid.

ericwill commented 4 years ago

From my understanding, we should convert the usage of tags to digests in:

Going forward then we should only accept contributions with digests used, no tags allowed. There may even be a PR check that could be triggered which will use a regex to automatically validate that no tags have been used in the PR being submitted.

l0rd commented 4 years ago

@ericwill that's fine. What about the use of next version of theia? Are we going to replace next here with a digest or should we consider it as an exception?

l0rd commented 4 years ago

@RickJWagner it depends on the image. If it's an update of a server image (operator, che server, registries etc...) those are updated without any manual intervention. If it's an update of a workspace image (theia, machine-exec, sidecars etc...) then existing workspaces that include that image need to be manually migrated. Existing workspaces only. New workspaces will automatically include the new patched image.

l0rd commented 4 years ago

@RickJWagner but in any case we need to release a new version of the operator first. Otherwise no update will be triggered.

ericwill commented 4 years ago

@ericwill that's fine. What about the use of next version of theia? Are we going to replace next here with a digest or should we consider it as an exception?

I think it makes sense to handle this as an exception, since we are the ones controlling the release of che-theia, and the releases of the registries are sync'ed with the releases of che-theia.

amisevsk commented 4 years ago

Another option here, somewhat in line with Nick's suggestion, is to fix images via digest for releases and leave tags in the master branch. Once we release e.g. 7.15.0 for the registries, we run the write_image_digests.sh script and that release will be a reproducible build, without the manual work of maintaining 'freshness' of the digests.

ericwill commented 4 years ago

Maybe I'm missing the point here, but wouldn't that still leave us with the problem of images "silently" changing between releases? For example: if a change is pushed to a :nightly image over the course of a sprint, then automatically updating that image at release time means a potentially breaking change is stuffed into the next release.

IMO we should heavily automate testing of these images, and have dependabot style automation for keeping them up-to-date. That way we aren't blindly updating images in a script, yet at the same time reducing our time spent maintaining these images. The added benefit here is that there is a clear trail of issues/PRs showing what was changed, so in case something breaks it's easier to find the breaking commit.

The flow would be:

  1. Dependabot checks images for updates, opens issues/PRs for updates.
  2. Strong automated testing of the image is triggered via CI.
  3. A quick manual verification/smoke test is conducted before merging.
amisevsk commented 4 years ago

Part of the issue here is that if we can implement

Strong automated testing of the image is triggered via CI.

that's all the issue boils down to. If we intend automated jobs to catch breaking changes, then it doesn't really matter all that much, does it?

The same image is published at release time in both cases, except that case b) will require manual verification of every change and be a significant manual burden until automated tests are in a stable place.

I agree that FROM images should be fixed by digest, I'm referring more to fixing digests for every image in the registries themselves.

ericwill commented 4 years ago

Okay I think we're talking about the same thing. My main concern are the 14 base images in the devfile registry. This is where "silent changes" can really cause problems for us.

The other images in the plugin registry aren't a high priority right now, because most of them are already using "digest style" tags in the format <IMAGE_NAME>:<VERSION>-<COMMIT ID>, so it's impossible for silent changes to happen as those images are only ever pushed once.

What I propose as an immediate action:

  1. Convert the base images in the devfile registry to digests
  2. Introduce dependabot automation for said images, so we get free PR's and can test each change easily. There aren't many images here anyway so it should be manageable.
  3. Improve automated tests for the plugin/devfile registries (already in progress)
  4. Compile a list of plugins not using the "digest style" tag format and convert them.

We could probably (just a thought) handle the plugin registry via attrition actually, and just do digests going forward as plugins become updated (those have to be done manually anyway, for now).

amisevsk commented 4 years ago

That makes sense :+1:, sorry I misunderstood the context of the discussion.

che-bot commented 3 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

ericwill commented 3 years ago

Plugin registry is done.

ericwill commented 3 years ago

Devfile registry is also done.