chainguard-dev / melange

build APKs from source code
Apache License 2.0
430 stars 111 forks source link

SBOM: Integrate opensbom-generator/parsers/go #137

Open puerco opened 2 years ago

puerco commented 2 years ago

Add support for build-time SBOM generation by integrating the go parser from opensbom

https://github.com/chainguard-dev/melange/issues/137#tasklist-block-f9e67421-5b10-42b6-b2ab-2662ecfef34b

imjasonh commented 2 years ago

I think there's an opportunity to do even better here. AIUI the opensbom-generator for Go reads a go.mod to generate the SBOM. This means that any dependency listed in go.mod, including testing-only dependencies, or dependencies only needed to satisfy certain build tags, might be erroneously included.

Since we're building the binary ourselves, we could examine the built binary using go version -m to determine exactly which dependencies made it into the built binary. We could even have a built-in pipeline for this (- uses: go-build) that captures all of go build && go version -m [binary] | sbomthing > my.sbom.json.

A lot of the code for the sbomthing placeholder above already lives in ko, and I'd be happy to help refactor it into a separate location, or make it more easily consumable by Melange or anything.

luhring commented 2 years ago

@imjasonh I like that approach! Since go version -m ... doesn't have all the SBOM information we'd want on a per-module basis, I'm assuming the ... | sbomthing would let us recover some of that information, given the correct list of modules? Thinking about data like licenses, for example

imjasonh commented 2 years ago

Yeah ideally that would be the responsibility of whatever thing consumes the output of go version -m.

Licenses is tricky because Go has taken a strong stand that they won't be the source of truth on licenses, and there are a number of projects (some of which Go uses) that attempt to detect a module's license, but don't/can't guarantee they're perfectly accurate. It's a guess with legal implications 🙃.

A lot of the packages I've seen to detect licenses do it in a pretty dependency-heavy way, which I've fought to keep out of ko, and I'd fight to keep out of dependencies of ko.