GoogleContainerTools / distroless

🥑 Language focused docker images, minus the operating system.
Apache License 2.0
19.2k stars 1.17k forks source link

Version pinning support? #686

Open springroll12 opened 3 years ago

springroll12 commented 3 years ago

I am a bit confused about the version tagging scheme for distroless images. My current understanding is that the tags (e.g. base-debian10) are updated each time a new version is published. This seems to indicate that version pinning is not possible. Are there other tags (e.g. git commit-id) that can be used to pin?

Maybe I've missed something, but it would be very helpful to be able to specify which version of a distroless tag (e.g. base-debian10-20210220 instead of base-debian10) is used to build.

If pinning is not possible, what is the recommended best-practice for upgrading distroless images? Should we be taking a snapshot, copying it to our own registry, retagging it and using that? Or is the idea that new versions of distroless tags are automatically rolled out to your images the next time they are built (this seems dangerous!)?

loosebazooka commented 3 years ago

You can pin to a specific hash. This is just the first link I found, but it has details on it: https://support.circleci.com/hc/en-us/articles/115015742147-Pinning-a-Docker-Image-to-a-Specific-Version

springroll12 commented 3 years ago

Sure. Can we agree this is not ideal though? Its difficult to discover what configuration a particular hash corresponds to and discovering new versions is painful at best.

It would be nice if I could see a git tag in this repo that corresponds to the image I'm actually running.

jonjohnsonjr commented 3 years ago

At least for gcr.io/distroless/base-debian10, it seems like they are tagged by commit. But this isn't the case for all images? Why?

briandealwis commented 3 years ago

It seems only base-*, static*, and cc-* are pinned and they're pinned to the debian9-amd64 image: https://github.com/GoogleContainerTools/distroless/blob/1e4a8bb3ad03f71b572cbcb3bbc25f3fd8d0ff14/BUILD#L36-L38

The tagging by $COMMIT_SHA should be done in the cloudbuild_docker.sh, methinks.

chanseokoh commented 3 years ago

IMHO, I doubt tagging images with git commit SHAs will help people easily find which images have which versions of packages that they want to use. If we add git commit SHAs to every image that we publish whenever a PR is merged, the user will just see the wall of git commit SHAs, which is not different from the current state where they just see the same wall of image SHAs. (I mean, of course it's easy to check package versions if you choose one image with a git SHA, but it doesn't help the user find which image to use.) For the other direction where the user starts with the git history, it's still cumbersome to look up which git commit the user should pick to find an image with the package versions they want.

briandealwis commented 3 years ago

@springroll12 You can't really pin to a tag as tags are mutable and may change; even if we did incorporate dates into the tag, there's no real guarantee that they might not accidentally change. @loosebazooka's suggestion of pinning to a particular digest, and you periodically updating it, is the only workable solution.

@chanseokoh good point on the wall of shas. OCI defines some labels/annotations that could be used for this purpose instead of adding a $COMMIT_SHA tag.

jonjohnsonjr commented 3 years ago

I think the wall of SHAs is a pretty decent UX, honestly. You can just pick an arbitrary commit for distroless and easily discover which image corresponds to that commit. Given that distroless doesn't have releases (and thus no corresponding tags), this is the most obvious way to organize images, IMO.

I think annotations are also a good idea, because it's often the case that you know an image digest but might have lost the tag. Being able to map from the image back to the distroless commit is wonderful. If we're going to start annotating images, I'd also include org.opencontainers.image.url or org.opencontainers.image.documentation or org.opencontainers.image.source that points back to this repo.

(Similar argument for labels, btw, because docker throws away the manifest... so having the the commit in the config file is useful as well...)

springroll12 commented 3 years ago

@briandealwis are you talking about git tags or docker tags? I realize pinning to a particular digest is the only way forward at this point, but that is why I created this issue. Plenty of other base images provide versioning for their generated images, why not distroless?

@chanseokoh I agree. My preference would be definitive release tags (like other image systems use). Git tags, while mutable are better than nothing in my view. The issue with docker image digests is it's difficult to tie that back to the code that built that image, so git tags or git commit SHA are slightly better in my view.

springroll12 commented 3 years ago

@jonjohnsonjr "given that distroless doesn't have releases" .. I think this is the crux of it. Why doesn't distroless have releases? It would be much more transparent.

jonjohnsonjr commented 3 years ago

Why doesn't distroless have releases?

That's a great question, and I think the answer is partially cultural and partially technical.

Culturally, Google doesn't really have releases internally. Everything just "lives at HEAD" in the monorepo. Of course, this is much less tenable in open source. We have no clue if external dependents of distroless get broken by a change because there isn't really a monorepo, so it would be nice if there were releases that folks could rely on. I'd say that this would actually be not good, because people might pin to releases, and given the constant influx of security fixes that go into distroless, I think pinning to specific releases is probably not a great idea. Of course, this is possible with docker by pinning the digest, and that makes sense in a lot of cases, but not always...

From a technical perspective, distroless is just syncing changes from debian every day at 8:30 using this script. It doesn't really make sense to have separate releases within distroless, because it's mostly just a projection of debian packages into a container image.

If you look at the most recent sync (https://github.com/GoogleContainerTools/distroless/pull/687/files), you can see that the only version information that really exists is debian9 vs debian10, the DEBIAN_SNAPSHOT, and the DEBIAN_SECURITY_SNAPSHOT.

These are more or less "living" versions, and I don't think distroless could reasonably have a "release" that is any more meaningful than just the commit, unless we have a release for every snapshot update, which would involve a (potentially) daily release cadence. Maybe that's what you want?

It might also make sense to just tag each image with the DEBIAN_SNAPSHOT and DEBIAN_SECURITY_SNAPSHOT values, but I'd defer that decision to @chanseokoh

springroll12 commented 3 years ago

RE: Pinning to releases being "not good"... I disagree.

As much as I admire the "100% rolling" culture at google, this is not tenable for most organizations. In highly regulated environments like healthcare or finance it is crucial to have visibility into exactly what is deployed at all times. Having a base image that can update underneath you between CI builds is a recipe for disaster, especially for small teams. While it's nice to assume that every team has their CI tests prepared for such scenarios, the reality is that many teams do not have the resources to test daily changes to the base container for X microservices.

It sounds to me like distroless does daily releases, so would it make sense to tag these with a timestamp? That way versions can be pinned and organizations can make their own assessments about when to upgrade (daily, weekly, etc). This would allow security fixes to be applied using existing workflows.

jonjohnsonjr commented 3 years ago

RE: Pinning to releases being "not good"... I disagree.

As much as I admire the "100% rolling" culture at google, this is not tenable for most organizations. In highly regulated environments like healthcare or finance it is crucial to have visibility into exactly what is deployed at all times.

I think we mostly agree here, but it really depends on the context. In most cases, I'm going to strongly advocate for users pinning their dependencies for all the reasons I'm sure you already understand. In fact, I've spent a long time pushing back on customer and community requests for "immutable tags" in GCR, because it needlessly introduces a need to trust the registry not to lie to you. We already have digests in the API, which give you cryptographic guarantees of immutability instead of requiring you to trust me never to change something. For knative/serving, I helped implement the tag -> digest resolution at deploy time to paper over the terrible default behavior in kubernetes. You can also find me yelling at people all over the internet about how the lackadaisical attitude around this is a huge problem.

However... for the casual consumer of distroless, I think it's better to just live at HEAD and hope for the best, because your'e going to get security patches and bug fixes for free. I agree that it would be nice if there were a bit more transparency around the versioning of distroless, but I worry a bit that publishing tags would encourage folks to pin to those tags, even though they would essentially never get updated.

Of course, just living at HEAD is a terrible strategy for production, so I would expect teams to snapshot their dependencies by digest and have gating mechanisms for upgrading to new versions of public content.

So, an obvious improvement here (for transparency) would be to include at least the commit in distroless labels and/or annotations, as brian suggested. The commit would allow you to discover the DEBIAN_SNAPSHOT and DEBIAN_SECURITY_SNAPSHOT by inspecting the repo. It would be reasonable to me to include those as labels/annotations as well, but not strictly necessary (just more convenient). I am somewhat in favor of adding commit tags to each image, because that makes certain workflows really easy to automate. Adding the debian snapshots as tags seems a little less appealing to me, since you could just browse the commit history instead and use the commit, but I'd be interested in hearing other opinions.

springroll12 commented 3 years ago

All valid points. At this point is seems like the path is set and I'm tilting at windmills, but let me try to respond anyway.

I guess it is true that if the labels/annotations were present it would be possible to locate the git-sha, but this is not particularly great DX. In my view its much easier to open the Dockerfile and look at the FROM line than to inspect a container/image for labels. I can't argue that immutability is not useful, but I think we need to balance that against discoverability.

Of course, just living at HEAD is a terrible strategy for production, so I would expect teams to snapshot their dependencies by digest and have gating mechanisms for upgrading to new versions of public content.

Exactly. As a team member who is responsible for many facets of development, the time it takes to discover which image to upgrade to for N microservices is non-trivial. Why not provide tags/releases with definitive changelogs that make it easy to discover which image (digest?) to upgrade to? (Maybe these could be stolen from debian?) This leads to fewer snapshot upgrades as well, since teams might not know when a critical release has occurred which should force an upgrade.

for the casual consumer of distroless, I think it's better to just live at HEAD and hope for the best, because your'e going to get security patches and bug fixes for free.

These mechanisms are not mutually exclusive. Definitive version tags can coexist with rolling versions (see latest).

jonjohnsonjr commented 3 years ago

All valid points. At this point is seems like the path is set and I'm tilting at windmills, but let me try to respond anyway.

I'm not trying to convince you otherwise, I'm just describing the limitations we have w.r.t. debian's releases. We seem to agree on the ideal state of things, but I don't think that ideal state is achievable without some help. I don't have the familiarity with debian to really make this much better, unfortunately. I know there are point releases for debian, but I'm not sure how those actually map to the snapshots distroless currently uses. It may be the case that we want something like a HEAD tag that corresponds to the current latest tag for distroless, and have latest actually map to the most recent point release.

I guess the reason we don't have releases or versions is that it's not completely obvious what we should do about it. Also, while distroless currently relies on debian releases, it's unclear if that will always be the case, so exposing debian versions in the distroless versioning scheme might not be a great idea. On the other hand, inventing and maintaining a separate versioning scheme for distroless seems hard to get right and would require some careful thought.

I guess it is true that if the labels/annotations were present it would be possible to locate the git-sha, but this is not particularly great DX. In my view its much easier to open the Dockerfile and look at the FROM line than to inspect a container/image for labels. I can't argue that immutability is not useful, but I think we need to balance that against discoverability.

This is a sore point for me, but in a lot of contexts there is no FROM line and there is no Dockerfile, e.g. when using rules_docker, which is what distroless uses. Of course I'd be fine with all three of these (tag + annotation + label) because it would make my life easier 😄 it's just a matter of doing the work and getting consensus.

Above I did mention that I would be in favor of tagging every image with the commit. It looks like the commit tags currently point to the amd64 images only and not the corresponding multi-platform image (unfortunately). This seems like something that could be fixed.

Exactly. As a team member who is responsible for many facets of development, the time it takes to discover which image to upgrade to for N microservices is non-trivial.

@ImJasonH and I are trying to improve that a bit over in OCI world, if you can spare the time to provide some feedback or support: https://github.com/opencontainers/image-spec/pull/822

Why not provide tags/releases with definitive changelogs that make it easy to discover which image (digest?) to upgrade to? (Maybe these could be stolen from debian?) This leads to fewer snapshot upgrades as well, since teams might not know when a critical release has occurred which should force an upgrade.

This would certainly be nice to have. I think it's a bit easier to automate the distroless image builds than it would be to automate GitHub releases. I also feel like having GitHub releases would be somewhat confusing -- do these map to changes to how distroless works, or to the debian upstream versions? It seems to be that debian has just a {MAJOR}.{MINOR} versioning scheme, so maybe we could do something silly like:

{DEBIAN_MAJOR}.{DEBIAN_MINOR}.{DISTROLESS_PATCH}

Where anytime distroless gets modified without changing the underlying package checksums, we bump the patch version to signal a change?

These mechanisms are not mutually exclusive. Definitive version tags can coexist with rolling versions (see latest).

Definitely, but in this case I believe the only rolling version we really have is latest. These aren't tags, but the major versions available here are debian9 and debian10 -- they're just embedded in the repo name instead of as tags -- e.g. gcr.io/distroless/static is just short for gcr.io/distroless/static-debian9. I'm not familiar enough with debian or distroless to figure out how we could incorporate the minor versions.

I guess to get really concrete about it, this is what I'd personally (not speaking from any place of authority or making any promises) like to see:

  1. All images pushed by distroless are tagged with the git commit.
  2. All images pushed by distroless embed the version information about the debian packages as labels, annotations, or both (maybe git commit, maybe SNAPSHOTS, maybe debian version -- lots of tradeoffs here).
  3. We create a new tag (HEAD) that lives at the debian HEAD and gets updated potentially every day -- this is what latest currently does, I believe.
  4. We move latest to live at the latest debian release, currently 10.8 and 9.13
  5. For extra credit: we have GitHub releases that correspond to debian releases + a patch version for changes to distroless.

Instead of changing the meaning of latest, maybe we'd want to keep that as HEAD and create a new stable tag (or something) that updates with minor releases.

Part of me wants to reorganize the repos a little bit as well to make this make more sense, but that might be too much of a breaking change. Ideally gcr.io/distroless/static:debian-9 would be the latest point release of stretch, and :debian-10 would be the latest point release of buster.

Now... that's a lot of changes to make, and a lot of code to write, and a lot of machinery to maintain. I don't have the debian expertise to actually make this happen, so I'm not going to implement it myself. I think if someone were to write up a proposal and willing to do the work to make this happen, it might happen, but I don't know anyone who is willing to do this work at the moment.

dlorenc commented 3 years ago

I'm in favor of tagging/labeling/annotating everything with commit shas back to this repo. I think the complaint here is really about the way debian handles versioning.

I do not want to try to build a versioning system on top of the existing Debian versioning system, which isn't designed to work this way. Distroless attempts to follow Debian in a predictable way - it does not attempt to build a distro of itself on top of debian.

jmoyano-koa commented 1 year ago

+1 to @springroll12 idea of tagging every image with a version tag + timestamp/snapshot/git versions/sequential number. This allow human and bots to identify pinned verisons (if you use semver). Right now, its impossible to tag to a previous version as there is no tag on old versions. e.g. gcr.io/distroless/static:debian-TIMESTAMP

I think the key point is to being able to have an history and being able to find the debian-WHATEVER versions.

The same applies to nodejs or any other language related image. You expect to have multiple tags for an image. Tags for major version, minor version, patch version, even underlaying os (debian9, debian10) if it is needed. But the goals is having a way to filter a kind of image and being able to look for previous versions related to distroless image, not the specific base image used to build the distroless. So, as an example tags for a nodejs image: node:<node patch/minor/major version> => node:16.12.5 node:<node patch/minor/major version>- => node:16-debian9 node:<node patch/minor/major version>-- => node16.12-debian9-TIMESTAMP

Again, specific underalying is not important unless there are more than one version like debian9 and debian10. The same way it is not important to know at this level if curl is version X or Y.

This will allow any user and even dependabot/renovatebot (I think) to identify previous images, or stick to minor, patch versions when its necessary. I don't know if this has any affect on SHA pinning. I guest it does, but I don't know how.

loosebazooka commented 1 year ago

It not an unreasonable request. We're currently doing a bunch of work to restructure the build here. This can only happen after that is done.

jmoyano-koa commented 1 year ago

Ok. let us know if we can do anything to help on that. Kind regards,

omBratteng commented 1 year ago

@loosebazooka I fully support being able to version pin images. Makes it easier for humans to understand what updates Dependabot suggest. Albeit I do recommend using image hashes, as tags can be overwritten.

jkytomak commented 1 year ago

Related to this, I have problem: Node 20.3.* has this problem which makes it unusable for us: https://github.com/nodejs/docker-node/issues/1912 but I would like to anyhow upgrade to node 20.2.0. How I can find a nonroot-amd64 node 20.2.0 image from here: https://console.cloud.google.com/gcr/images/distroless/global/nodejs20 I probably can get version from build date, but what about nonroot-amd64? Is there really no way to do this currently?

joebowbeer commented 5 months ago

Most of the discussion above is about distroless base (debian) images, but like @jkytomak I am interested in distroless NodeJS images, and I expected Node's minor and patch version tags to also be present in the distroless image registry.

We have been "broken" by changes released in NodeJS minor version upgrades on more than one occasion, and so we only want to apply patch upgrades automatically.

Some additional tagging should also assist tools like renovate, where users who are using image hashes can configure their allowed renovate updates using semver specifiers.

Luk-z commented 2 months ago

Same problem using gcr.io/distroless/nodejs20-debian12:latest. node 20.16 have possible memory leak using fetch and we had a huge memory leak after deploying to production. After some investigation we realized that the only relevant change was the “latest” distroless. We need to know what was the distroless image used in the previous build (2024/07/25). The log of that build was not available because too old. We spent most of time (2+ hours) to figure out:

(we tried gcloud but was not helpful)

loosebazooka commented 2 months ago

yeah, no doubt these are valid concerns, we just haven't gotten to it yet.