anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.02k stars 555 forks source link

Option to set `PackageSupplier` in root of SPDX document generated by CLI #3098

Open pczuj opened 1 month ago

pczuj commented 1 month ago

What would you like to be added: An CLI option to set PackageSupplier of root entry in the generated SPDX document.

Why is this needed: We're currently doing it by sed executed after generation and that's not ideal for multiple reasons. Primary problem is that it generates a lot of questions to the configuration reader. Secondary is that it creates non-portable and fragile point of failure.

Additional context: PackageName and PackageVersion can already be set using --source-name and --source-version. Possibly a --source-supplier option could be created.

kzantow commented 1 month ago

Thanks for filing this. I believe you are using a directory scan, correct?

What type of project(s) are you scanning?

I'm trying to determine if there are some other things Syft could do here. For example: if you are doing a directory scan and the directory contains a package.json and a package-lock.json, Syft is including the package-lock.json dependencies but not necessarily the top-level project, and none of this information is elevated to the top-level component when SPDX or CycloneDX is output. From my perspective this could be considered what you scanned, which would make sense to populate some of the source information including the supplier if present. The same is true for a pom.xml, perhaps, and probably many other project source file types. Would this help in any way if some sort of project file information was elevated into the document's root package, including a supplier field?

But also stepping back a bit further, this request seems to me to be at least a part of a more general ask to "provide package information for the top-level package". I occasionally wonder why we don't seem to have a top-level package in Syft's own data model, while the other main SBOM formats do. When scanning a container, the container as the top-level element makes a lot of sense, and this matches the capabilities in both SPDX and CycloneDX to represent the scan this way. But when scanning a directory or, say, a go binary file, the main module seems to be something that could represent the source.

Sorry for the longwinded comment here, but I'm just trying to figure out if there are other things we should potentially do instead of surfacing just a --source-supplier, and then possibly another package field, and another.

pczuj commented 1 month ago

Hi @kzantow, thanks for response and your effort!

I believe you are using a directory scan, correct?

Yes

What type of project(s) are you scanning?

We're scanning Java application built on top of Tomcat. The dependencies are *.jar, pom.xml, *.exe and *.dll files.

We do have a root pom.xml that defines the application and though it's parent pom.xml chain it has project metadata like <organization>. If we're able to specify the pom.xml then we should work for us, however being able to simply provide --source-supplier seems like a big simplification to us.

Right now we're executing syft using exec-maven-plugin (config inside pom.xml), so if we'd need to generate some other metadata file it'd increase complexity for us.