docker / roadmap

Welcome to the Public Roadmap for All Things Docker! We welcome your ideas.
https://github.com/orgs/docker/projects/51
Creative Commons Zero v1.0 Universal
1.74k stars 261 forks source link

Set base image annotations #243

Open imjasonh opened 3 years ago

imjasonh commented 3 years ago

Tell us about your request

Set OCI standard annotations on images describing information about the image's base image.

https://github.com/opencontainers/image-spec/blob/main/annotations.md

  • org.opencontainers.image.base.digest Digest of the image this image is based on (string)
    • This SHOULD be the immediate image sharing zero-indexed layers with the image, such as from a Dockerfile FROM statement.
    • This SHOULD NOT reference any other images used to generate the contents of the image (e.g., multi-stage Dockerfile builds).
  • org.opencontainers.image.base.name Image reference of the image this image is based on (string)
    • This SHOULD be image references in the format defined by distribution/distribution.
    • This SHOULD be a fully qualified reference name, without any assumed default registry. (e.g., registry.example.com/my-org/my-image:tag instead of my-org/my-image:tag).
    • This SHOULD be the immediate image sharing zero-indexed layers with the image, such as from a Dockerfile FROM statement.
    • This SHOULD NOT reference any other images used to generate the contents of the image (e.g., multi-stage Dockerfile builds).
    • If the image.base.name annotation is specified, the image.base.digest annotation SHOULD be the digest of the manifest referenced by the image.ref.name annotation.

(lots more discussion and motivation in https://github.com/opencontainers/image-spec/pull/822/)

Which service(s) is this request for?

docker build

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Setting these annotations makes it easier for other tools to determine information about the base image, such as whether it contains vulnerabilities, whether there are updates to fix those vulnerabilities, etc. More details at https://articles.imjasonh.com/oci-base-image-annotations

Are you currently working around the issue?

There is currently no reliable mechanism to determine whether an image's base image has updates.

Additional context

This does not cover annotating information about other images that may have been involved in producing the image in question, such as during a multi-stage build (FROM golang AS builder) -- only the final image that contributed base image layers.

This should be considered safe to annotate in the general case, but it's possible someone would want to disable this behavior. I'm not familiar with the preferred mechanism to enable/disable behavior like this in Docker, e.g., environment variable vs flag to docker build.

JamieMagee commented 2 years ago

@imjasonh I started looking into this and docker has, independently, come up with their own solution for this: https://github.com/moby/buildkit/blob/master/docs/build-repro.md

This hasn't made it into a docker release yet, but you can pull in a newer version of buildkit with support. For example, I have a Dockerfile:

FROM busybox

I setup my environment to use a more recent version of buildkit

docker buildx create --use --driver-opt image=moby/buildkit:master

I build the image, and push it to Docker Hub

docker buildx build --output type=image,name=docker.io/jamiemagee/busybox-test,push=true .

Then I inspect the raw image config using skopeo, and extract and decode the moby.buildkit.buildinfo.v1 key

skopeo inspect docker://docker.io/jamiemagee/busybox-test:latest --config --raw | jq -r  '."moby.buildkit.buildinfo.v1"' | base64 --decode | jq

which gives me

{
  "sources": [
    {
      "type": "docker-image",
      "ref": "docker.io/library/busybox:latest",
      "pin": "sha256:e7157b6d7ebbe2cce5eaa8cfe8aa4fa82d173999b9f90a9ec42e57323546c353"
    }
  ]
}

I also tested it with a multistage Dockerfile:

FROM golang:1.17.3-alpine as builder

FROM alpine:3.14

COPY --from=builder /usr/local/go/bin/go /usr/bin/go

This time pushing to my docker daemon, instead of to a registry:

docker buildx build --load --tag docker.io/jamiemagee/busybox-demo .

And inspecting using skopeo again

skopeo inspect docker-daemon:docker.io/jamiemagee/busybox-demo:latest --config --raw | jq -r  '."moby.buildkit.buildinfo.v1"' | base64 --decode | jq

which gives

{
  "sources": [
    {
      "type": "docker-image",
      "ref": "docker.io/library/alpine:3.14",
      "pin": "sha256:635f0aa53d99017b38d1a0aa5b2082f7812b03e3cdb299103fe77b5c8a07f1d2"
    },
    {
      "type": "docker-image",
      "ref": "docker.io/library/golang:1.17.3-alpine",
      "pin": "sha256:55da409cc0fe11df63a7d6962fbefd1321fedc305d9969da636876893e289e2d"
    }
  ]
}

A big thank you to @tonistiigi for pointing me in the right direction here.

imjasonh commented 2 years ago

Thanks @JamieMagee that's really interesting, I didn't know about this work in docker buildx!

During the discussion about the semantics of the OCI annotation, we definitely discussed (at length!) how to express multi-stage build inputs and many many many other types of build inputs, and eventually decided that base images were a reasonable first step that had clear semantics across all build tools and scenarios (i.e., base images share layers with the final output image).

Many images aren't built with Dockerfiles today, so having specific semantics for Dockerfile's multi-stage builds didn't seem appropriate. Since buildkit can assume Dockerfiles are involved, it makes sense to have those semantics taken into account in the buildkit-specific annotations.

Your multi-stage example is interesting too, because the annotation doesn't actually tell me which of those two docker-images shares layers with the output image, it basically just signals "these images were involved with the building of this image ...somehow". This is still helpful, but doesn't tell me whether my image shares layers with any of those. That's fine, it's still useful to know that if golang:1.17.3-alpine no longer points to sha256:55da... I should rebuild. It just doesn't easily tell me which of those images can trigger a rebase.

Looking ahead, two questions:

  1. What's the timeline look like for these buildkit-specific annotations to reach images built with Dockerfiles and regular docker build?
  2. Is there any interest in also repackaging this information into the OCI annotation format for base images specifically? I think that would be the best of both worlds.
tonistiigi commented 2 years ago

Many images aren't built with Dockerfiles today, so having specific semantics for Dockerfile's multi-stage builds didn't seem appropriate.

This feature is implemented at LLB level and there is nothing Dockerfile specific about it. All buildkit frontends support it.

What's the timeline look like for these buildkit-specific annotations to reach images built with Dockerfiles and regular docker build?

Buildkit has been default builder for Docker Desktop for some time now. Linux default will change when there is a next feature release for engine. CLI is switching to buildx as well for consistent UX https://github.com/docker/cli/pull/3314

imjasonh commented 2 years ago

This feature is implemented at LLB level and there is nothing Dockerfile specific about it. All buildkit frontends support it.

You're right, I spoke imprecisely. Many images aren't built with Docker's semantics at all, whether that's with the specific Dockerfile format or another frontend leveraging the internal LLB.

The concept of a "multi-stage build" doesn't exist at all in these tools (Buildpacks, ko, Jib, Bazel), only layers on top of a base image, so it didn't make sense to include those semantics in the OCI annotations. I'd be open to discussing adding more annotations though, they can be really useful for detecting an image in need of a rebuild.

It sounds like buildkit is already doing the work to identify base images, in order to put that information into the config, would it make sense to also have buildkit put that information into annotations (when/if it produced OCI images that support image annotations)? That seems like a relatively easy win for standardization across the ecosystem.