anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.26k stars 575 forks source link

Adapt new and existing package metadata as SPDX relationships #476

Open wagoodman opened 3 years ago

wagoodman commented 3 years ago

SPDX has the concept of relationships that can be applied to packages, files, or other artifacts. This issue aims to explore what existing metadata can be expressed via SPDX relationships as well as potentially add more metadata to collect via the catalogers that can be expressed as SPDX relationships.

Internal to syft there is already the concept of package-to-package relationships, what isn't clear is if this should be further expanded generally or isolated only to the SPDX presenter (which is generally a new concept, since all data typically gets expressed via the JSON model first).

spiffcs commented 3 years ago

I've been doing research on this since Friday afternoon and this AM. The first questions I have concerns this repository: https://github.com/spdx/tools-golang

The library is hosted by the spdx organization and defines roughly the same model we do in presenter/packagers/model/spdx22.

If we want to express SPDX relationships for 2.2 correctly I think the first discussion should be around if we move to their model or keep our presenter model.

The tough part with this proposal is that their model is not designed as a presenter (no JSON tags).

They take a pretty interesting approach as far as marshalling/unmarshalling.

Check out their parsing code here.

@wagoodman if you have time today can we talk a bit about the complexity of juggling our JSON model against the spdx tool model and then wrapping that into a "correct" presenter?

kzantow commented 3 years ago

One thing of note: I think CycloneDX also has a way of specifying dependencies using a bom-ref: https://cyclonedx.org/use-cases/#dependency-graph so we would quite possibly want this handled somehow in our own model. Although it is possible this is just a reference within the own document, it's a bit unclear to me.

spiffcs commented 3 years ago

Added - https://github.com/anchore/syft/pull/507 as a starting point to build out the initial ROOT --> Package relationships. This PR makes the assumption that packages discovered by the cataloger are all directly related to the scanned image/directory.

spiffcs commented 3 years ago

https://github.com/anchore/syft/pull/507 has been updated to now populate Files and include vertices between Packages and Files in the Relationships field

spiffcs commented 3 years ago

Before we dig further into relationships it's probably worth tackling some of the prioritized bugs we have surrounding SPDX. I pulled in https://github.com/anchore/syft/issues/460 to make some progress in cleaning up our license section.

spiffcs commented 3 years ago

@luhring The next steps for making this better is starting to dive on the architecture changes we talked about on Monday.

https://github.com/anchore/syft/issues/516

We'll work on refining this issue so we have a clear path to get syfts command API in the right place.

wagoodman commented 3 years ago

I've broken out package relationships work into https://github.com/anchore/syft/issues/572 such that package catalogers raise this information before presenters/formats can leverage them (such as SPDX).