anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.98k stars 551 forks source link

Invoke known tools to gather build-time dependency information #1562

Open kzantow opened 1 year ago

kzantow commented 1 year ago

What would you like to be added: Add the ability to shell-out to known tools such as go and mvn in order to capture more accurate build-time dependency information.

Why is this needed: To improve the build-time dependency support in Syft.

Additional context: From a working document:

Creating higher quality SBOMs in Syft at build-time

At build time, static analysis of dependencies implemented today is limited. Improving static analysis metrics can be done by simulating what build systems do. This is subject to drift and additional maintenance to keep up with behaviors of the build systems.

One approach to resolving this issue is to call out to build systems to get that information instead. This introduces additional (optional) dependencies.

Syft as a build-time SBOM generator tool

Syft can be seen as a “build-time” SBOM generator tool, and can start thinking about utilizing build tooling. Calling out to build tools can be such as the following, we will use 2 examples.

Golang

Instead of reading and trying to parse and resolve the go.mod file, the go mod graph command can be used to get a fully resolved dependency tree.

Java

Maven has the mvn dependency:tree command which shows the fully-resolved dependency graph.

NPM

Npm has npm ls --all

Python

Python has pipdeptree

Considerations

kzantow commented 1 year ago

cc: @lumjjb

wagoodman commented 1 year ago

This is most likely needed on order to achieve https://github.com/anchore/syft/issues/1674 and https://github.com/anchore/syft/issues/572 in a meaningful way.

This functionality should be opt-in, that is, by default syft should remain a static analysis tool. Executing other commands on the system should still be not allowed by default (again, unless the user opts in).

Considerations:

setchy commented 1 year ago

would the same be true for npm, too?

noqcks commented 12 months ago

Instead of shelling out to cli tools, would you consider building parsers directly inside syft? It wouldn't require one to depend on the presence of local tooling, and I could envision that the tools might have different ouputs depending on the installed version of the tooling.

Snyk, for example, has built a bunch of parsers for various ecosystems in js https://github.com/snyk/dotnet-deps-parser

I just implemented something similar in cdxgen for .NET [ref] and npm [ref] to determine direct/indirect deps in build files. Wondering if the same could work here but written in golang.

Add the ability to shell-out to known tools such as go and mvn in order to capture more accurate build-time dependency information.

Can you expand on what specifically would be more accurate. In my mind I can only imagine direct/indirect deps. But is there more?

I also noticed that syft doesn't generate a dependencies section for CycloneDX for different language specific files (go.mod, package-lock.json). Would the outcome of this issue be that this section would be filled?

kzantow commented 12 months ago

@noqcks we do already have lots of parsers for different ecosystems. This change, at least initially, would be an opt-in behavior to shell out to the tools. This would allow things like Go - which has a flat list of dependencies in the go.mod - to get the dependency graph and properly output it in different formats.

noqcks commented 12 months ago

I suppose what I meant is only shelling out to tools where necessary (in the case that go mod graph is truly the only way to see the dependency tree for go projects), and writing all other dep graph parsers directly into syft where possible.

I'd like to work on getting real dependency graph for javascript projects inside syft, and wanted to write this dep parser inside syft instead of relying on an external npm cli.

Wanted to clarify whether this would be an appropriate avenue to pursue before I started the work.

wagoodman commented 5 months ago

Since this is a potentially large item that would affect multiple ecosystems I think a detailed plan is needed to move forward with this (how would this work within a single cataloger, what abstractions do we want to introduce (if any), would abstractions be generalizable to other ecosystem catalogers (if so, how), etc)