[Feature Request]: generating SBOMs for container images while building them

developer-guy commented 2 years ago

I'm a huge fan of the bake command, recently I opened a similar issue to the builds which you can see from here.

Nowadays, SBOM (Software Bill Of Materials) is a trending topic. So, we thought that maybe we can support this SBOM generation as a separate target within the docker-bake.hcl. There are many alternatives to generate SBOMs.

So, we can pick from one of these to generate SBOMs while building container images.

cc: @luhring @nishakm @puerco @Dentrax @imjasonh 🥳🙋🏻‍♂️

luhring commented 2 years ago

I love this. 😍

Curious what the requirements should be (e.g. support for multiple SBOM formats).

(I'm a maintainer on Syft) Let me know if I can help out in any way!

nishakm commented 2 years ago

I have a PoC that does exactly this: https://github.com/vmware-samples/containers-with-sboms. It would be super cool if buildx could integrate SBOM generation every time a filesystem snapshot is created.

justincormack commented 2 years ago

I moved this issue to our roadmap repo to get broader feedback.

coderpatros commented 2 years ago

I 100% think that the most accurate SBOMs are generated at build time, with close native integration with build systems.

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

If this is implemented it will need to be clearly defined what it can and cannot do.

Don't want to be a party pooper. I'm a big advocate of SBOMs. I just don't want to see another rushed useless implementation.

And to be honest, I'd rather see a useful SBOM for the Docker tooling itself first. There's already some good tools like Syft and Tern for container image SBOM generation.

hectorj2f commented 2 years ago

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

I totally agree with this ☝🏻 . It will miss dependencies and information.

I'd rather see a useful SBOM for the Docker tooling itself first.

Yes, that is an option, or having both SBOM files a build (source code repo) and runtime (container).

nishakm commented 2 years ago

But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.

Very true! One thing Tern does is parse the created_by data to figure out what the intent of the builder was. It's not very good with figuring out full shell scripts though, especially if the shell scripts use build arguments. In this case, I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.

luhring commented 2 years ago

I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.

💯 To me, this will be a necessity. As we're mentioning, there are numerous cases where analyzing only the image will give you an incomplete picture of what software is present, even with the best analysis available. If the goal is "completeness" in the image artifact's SBOM, user input of information that was victim to lossy transformations will be critical. We're working on this in Syft — and I'm sure other SBOM tools can/will handle this as well. 👍

justincormack commented 2 years ago

@coderpatros by "useful SBOM for the Docker tooling itself" do you mean inputs that Docker already knows about, like base layers? I totally agree that there is a difficult mix of things in Docker builds, potentially arbitrary shell scripts and network access, and so we are going to have to use a mix of methods.

People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?

coderpatros commented 2 years ago

@justincormack I mean an SBOM that describes, as an example, the Docker CLI.

tianon commented 2 years ago

This is very interesting :smile:

I think anything that happens in docker build by default would make me a little wary (given the potential overhead of deep calculation/inspection of things like packages inside the image), however I think there'd be a ton of value in optionally including more of the Dockerfile/build context data somehow.

Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific openjdk tag/digest I used to build my-application.jar is very relevant information for that final my-application.jar artifact.

There are a lot of blurrly lines here depending on how deep a user might want metadata, and the degree of data is probably going to change the "calulcation/information gathering overhead" pretty signifcantly and for users building closed-source solutions, potentially too much information, leaking things they didn't want to, like details about their source code, internal container registry, or worse.

(I guess what I'm trying to get at there is that all aspects of this probably need to be opt-in?)

For my own use cases, I don't think I'd want this to happen during docker build itself unless it was very, very fast (so that it's not in the critical path for build/push).

To illustrate a bit better, a full clean build of all the variants of https://hub.docker.com/_/python already takes several hours per architecture, even on a reasonably fast machine, so having the SBOM calculated out-of-band could be pretty dramatic.

imjasonh commented 2 years ago

Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific openjdk tag/digest I used to build my-application.jar is very relevant information for that final my-application.jar artifact.

See https://github.com/docker/roadmap/issues/243 for a concrete proposal toward this goal.

nishakm commented 2 years ago

People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?

A few things come to mind for me:

A record of the base image in the "created_by" field or a dedicated field in the config.
A record of build argument values (although this would close the option of passing secrets via the build arguments, I personally believe this is a good thing)
Some option in docker build for a SBOM generation tool to access the mountpoint
Some ability to include or reuse SBOMs created externally
Some ability to record the state of intermediate containers during a multi-stage docker build (this is a pet peeve of mine as this is the bit that most container builders tell me is impossible to do using docker now)

To @imjasonh's point of recording the base image: To make it easier for tools to parse this information, it would be nice to record the base images all the way to scratch. For example, there are multiple base images that have contributed to the final golang image.

As for the shell script parsing, some environment variable substitution would help greatly. Tern currently tries to do this with some success.

chris-crone commented 2 years ago

With Docker Desktop 4.7.0 (released yesterday), we have shipped an experimental docker sbom CLI command. The command scans and then outputs the SBOM of a container image using the Syft project. You can find its source code here.

As discussed in our blog post, this is just the first step. The goal is to work with partners and the community to add SBOM generation directly into docker build through BuildKit integrations. We have opened an issue on the BuildKit repo to get help and input.

Please give the docker sbom command a try and give us feedback on it on its repo!

We'd also love anyone who is interested in collaborating on this work to engage on the BuildKit repo or on the Docker Community Slack.

docker / roadmap

[Feature Request]: generating SBOMs for container images while building them #274