Closed developer-guy closed 1 year ago
I love this. π
Curious what the requirements should be (e.g. support for multiple SBOM formats).
(I'm a maintainer on Syft) Let me know if I can help out in any way!
I have a PoC that does exactly this: https://github.com/vmware-samples/containers-with-sboms. It would be super cool if buildx could integrate SBOM generation every time a filesystem snapshot is created.
I moved this issue to our roadmap repo to get broader feedback.
I 100% think that the most accurate SBOMs are generated at build time, with close native integration with build systems.
But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.
If this is implemented it will need to be clearly defined what it can and cannot do.
Don't want to be a party pooper. I'm a big advocate of SBOMs. I just don't want to see another rushed useless implementation.
And to be honest, I'd rather see a useful SBOM for the Docker tooling itself first. There's already some good tools like Syft and Tern for container image SBOM generation.
But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.
I totally agree with this βπ» . It will miss dependencies and information.
I'd rather see a useful SBOM for the Docker tooling itself first.
Yes, that is an option, or having both SBOM files a build (source code repo) and runtime (container).
But the entire build system of the assembled software, in this case, can actually be a combination of anything. And parts of it can be completely opaque to Docker tooling when building the container.
Very true! One thing Tern does is parse the created_by
data to figure out what the intent of the builder was. It's not very good with figuring out full shell scripts though, especially if the shell scripts use build arguments. In this case, I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.
I wonder if we can get closer to a more accurate SBOM if some of that data is provided by the user.
π― To me, this will be a necessity. As we're mentioning, there are numerous cases where analyzing only the image will give you an incomplete picture of what software is present, even with the best analysis available. If the goal is "completeness" in the image artifact's SBOM, user input of information that was victim to lossy transformations will be critical. We're working on this in Syft βΒ and I'm sure other SBOM tools can/will handle this as well. π
@coderpatros by "useful SBOM for the Docker tooling itself" do you mean inputs that Docker already knows about, like base layers? I totally agree that there is a difficult mix of things in Docker builds, potentially arbitrary shell scripts and network access, and so we are going to have to use a mix of methods.
People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?
@justincormack I mean an SBOM that describes, as an example, the Docker CLI.
This is very interesting :smile:
I think anything that happens in docker build
by default would make me a little wary (given the potential overhead of deep calculation/inspection of things like packages inside the image), however I think there'd be a ton of value in optionally including more of the Dockerfile
/build context data somehow.
Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific openjdk
tag/digest I used to build my-application.jar
is very relevant information for that final my-application.jar
artifact.
There are a lot of blurrly lines here depending on how deep a user might want metadata, and the degree of data is probably going to change the "calulcation/information gathering overhead" pretty signifcantly and for users building closed-source solutions, potentially too much information, leaking things they didn't want to, like details about their source code, internal container registry, or worse.
(I guess what I'm trying to get at there is that all aspects of this probably need to be opt-in?)
For my own use cases, I don't think I'd want this to happen during docker build
itself unless it was very, very fast (so that it's not in the critical path for build/push).
To illustrate a bit better, a full clean build of all the variants of https://hub.docker.com/_/python already takes several hours per architecture, even on a reasonably fast machine, so having the SBOM calculated out-of-band could be pretty dramatic.
Some of the data that's really difficult to get after the fact that Docker itself is uniquely suited to provide are exact image IDs/digests or even locations/names for base images and information about the other build stages that helped create the final image. For example, the specific
openjdk
tag/digest I used to buildmy-application.jar
is very relevant information for that finalmy-application.jar
artifact.
See https://github.com/docker/roadmap/issues/243 for a concrete proposal toward this goal.
People building tools, one question I have is what hooks would be useful to you? If we have to plumb data through (input SBOMs from base, input SBOMs from added software, analysed parts) what kind of hooks would make this easier for your tools?
A few things come to mind for me:
docker build
for a SBOM generation tool to access the mountpointTo @imjasonh's point of recording the base image: To make it easier for tools to parse this information, it would be nice to record the base images all the way to scratch. For example, there are multiple base images that have contributed to the final golang
image.
As for the shell script parsing, some environment variable substitution would help greatly. Tern currently tries to do this with some success.
With Docker Desktop 4.7.0 (released yesterday), we have shipped an experimental docker sbom
CLI command. The command scans and then outputs the SBOM of a container image using the Syft project. You can find its source code here.
As discussed in our blog post, this is just the first step. The goal is to work with partners and the community to add SBOM generation directly into docker build
through BuildKit integrations. We have opened an issue on the BuildKit repo to get help and input.
Please give the docker sbom
command a try and give us feedback on it on its repo!
We'd also love anyone who is interested in collaborating on this work to engage on the BuildKit repo or on the Docker Community Slack.
I'm a huge fan of the
bake
command, recently I opened a similar issue to the builds which you can see from here.Nowadays, SBOM (Software Bill Of Materials) is a trending topic. So, we thought that maybe we can support this
SBOM
generation as a separatetarget
within thedocker-bake.hcl
. There are many alternatives to generate SBOMs.So, we can pick from one of these to generate SBOMs while building container images.
cc: @luhring @nishakm @puerco @Dentrax @imjasonh π₯³ππ»ββοΈ