Open kzantow opened 3 months ago
Related to #572
We want to be able to describe the topology and limitations of any dependency graph that an SBOM is producing. This isn't based on the SBOM as a whole, a language or packaging ecosystem, but really a package at a time based on the evidence we found and what we know about the kind of files that make up that evidence (e.g. package.json vs package-lock.json provide different answers here, which also differ when there is the existence of a populated node_modules dir from a previously run npm install command).
I feel that on a per-package bases we're looking for the following description:
So how should we start expressing these topologies? I have an early/incomplete thought about a new field onto the pkg.Package
called dependency
with the following subfields:
nodes
: with possible values...
unknown
: no distinction is made about if we're able to find any package dependenciesdirect-only
: partial set of nodes, only describing direct dependencies all
: all direct and indirect nodes are describededges
: with possible values...
unknown
: no distinction is made about if we're able to find any information about how dependencies are related to one anotherflat
: nodes have relationships between both direct and indirect dependencies; cannot distinct between direct and indirect dependenciesall
: nodes have relationships between themselves and only direct dependencies One question that comes to mind: what about cases where we can partition nodes into direct/indirect dependencies but it is still a flat list (like go.mod)? We can only say all/flat
but it's still valuable to know which of these nodes are indirect. Does this mean we should add additional dependency information onto the edge itself? (in which case this is a non-point)
While I'm not sold on the specifics of the field, I think I'm becoming more convinced that describing the node and edge qualities separately is more valuable then attempting to combine them into a single enum field.
Another consideration is that there are nodes in the graph that cross ecosystems, combining nodes making up dependency graphs in one ecosystem with another dependency graph for another ecosystem. One example of this is with binary packages: these may relate to any number of other ecosystems based on file ownership overlap and dynamic imports (and soon dlopen descriptions) from that binary. So it may not be as simple as having an ecosystem cataloger make a claim on a package about it's node/edge/capability conclusion... this may additionally be a post-cataloging analysis that further annotates these qualities based on the final graph captured.
Thoughts to be continued in another post soon...
From a discussion with the team on this one, we nudged this into a different direction. The conclusive point of discussion was: when asking a single package node information about dependencies it shouldn't attempt to answer anything outside it's immediate dependencies. That is, asking a node to describe the graph isn't really correct. We should instead limit the answer to only the immediate part of the graph that the node is privy to.
This somewhat eliminates the need to describe edges in such depth. The current suggestion from the team is to have a single dependencies
field with the following possible enum values:
unknown
: no distinction is made about if we're able to find any package dependenciescomplete-direct-only
: the full set of direct dependencies are enumeratedcomplete-transitive
: the full set of direct and indirect dependencies (mixed) are enumeratedincomplete
: a partial set of dependencies are enumerated (with no distinction about if they are direct or indirect)Furthermore, to open back up a conversation from #572, we should be qualifying edges that are known direct dependencies vs are known transitive (indirect) dependencies. In the common case of direct dependencies, using the dependency-of
relationship type is what we should continue to use. However, we should not use this relationship type when describing dependencies that are NOT direct dependencies --another type should be created for this purpose.
I'm not sure why I hadn't looked this up before, but I should also note the related SPDX 3 field: https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Vocabularies/RelationshipCompleteness/. This is defined on a one-to-many relationship element and isn't exactly the same thing as we were talking about but is very closely related, I think.
What would you like to be added: Relationship depth information, when Syft is unable to provide a full transitive dependency graph.
Why is this needed: One of the data elements mentioned in the NTIA minimum requirements is the depth of relationships. If Syft is able to build an accurate SBOM with a full transitive-dependency graph, that would be ideal, but different scenarios prevent this information from being included or accurately depicting the transitive graph. Some examples are Python
requirements.txt
and Go binary mod information, which only provide a flat list of dependencies. Or binaries which are only directly identified without dependent component information.One solution is to provide an "unknown" indicator that Syft was unable to determine a full transitive dependency graph, or Syft stopped after 5-levels deep resolving online parent references. These can be returned as "unknowns" from catalogers where appropriate to be associated with the file(s) where package graph information originated.
Additional context: This is likely to be dependent the PR for known unknowns getting merged.
This is a part of #632