Closed hectorj2f closed 2 years ago
Hi @hectorj2f βΒ This makes sense, and it's something we definitely want to add soon.
There's a related effort to refactor how we ingest and output SBOM formats that we're working on currently, and we'll be able to leverage this work to extend our support of various formats. cc: @wagoodman πͺ
Related issues:
@luhring @wagoodman We'd like to explore the possibilities here to understand what it needs to be done (a potential ETA) even if we need to help. Otherwise we'd like to know if converting CycloneDX format into a CycloneDX Syft-friendly format would be a possibility here. Thanks in advance.
@hectorj2f Let me at the least highlight the inputs to the vulnerability matching process and then we can figure how that will map into various SBOM formats (e.g. CycloneDX, SPDX, etc).
Today we require a few fields for each package:
artifacts[].name
)artifacts[].version
)artifacts[].type
)The type
is important as it is what is used to determine which matcher object to use ([1] [2]). Next up is the name and version --these are table stakes for any vulnerability matching process really.
Next there is more nuance needed to center which ecosystem the matcher should be searching within. For OS packages (RPMs, DEBs, and alpine APKs) we need to know the linux distribution name and version (e.g. rhel:8
) --this narrows down the search to a subset of vulnerabilities from the whole database (and minimizes false positives). This is encapsulated in the Syft JSON output as the distro.name
, distro.version
, and distro.idLike
fields (which correspond to /etc/os-release
fields).
With all of the above fields in place you can get basic matching working. However, the results will be lossy if the below fields are not also included.
The OS packages also have optional information about upstream build dependencies for a package ("source" packages... [apk] artifacts[].metadata.originPackage
, [dpkg] artifacts[].metadata.source
, [rpm]) artifacts[].metadata.sourceRpm
. This information is used to search for vulnerabilities that affect upstream packages that could affect the downstream package that is installed on the system.
Lastly there are some language-specific properties that are important. Today this is restricted to only some java fields... essentially artifacts[].metadata.artifactID
and artifacts[].metadata.groupID
.
Above are all of the fields that are used as input into the vulnerability matching process. An input SBOM that has a subset of these fields may not produce complete results.
As @luhring was mentioning, we're planning to add support for ingesting SPDX 2.2 XML/JSON and CycloneDX 1.2 XML documents with these format upgrades. We're in the middle of upgrading syft to include decoding capabilities for common SBOM formats that we can leverage in grype. Progress for implementing encoding can be tracked with anchore/syft#395 . I'm still working out details on decoding, or rather, expanding encoding and providing decoders that map as much information as is possible into each target format that is "kosher" to that format. I had a draft PR (that I closed) that implemented SPDX JSON encoding/decoding that was lossless relative to Syft JSON (used for grype input)... but it wasn't "kosher" relative to the SPDX JSON target format --encoding went "out of spec" and decoding leveraged these fields, which is not desirable.
This all is pretty high in priority right now, we're implementing changes now that enable more of this work in the coming weeks to get "unlocked". Hopefully this gets you going in the meantime! (also shout out if you have more questions on this, happy to answer!)
That aligns with my experience when we tried to generate our own syft json (from buildpack manifests): 1) determining what syft type to use 2) creating valid CPEs from partial information 3) specifying a placeholder distro 4) skipping the optional source fields
Cyclonedx providing the purl and/or CPE would alleviate 1 and 2 (I'm hoping the purl's scheme:/type would be mappable to syft type).
3 and 4 might be inferred from a few places in the document (meatadata, nested components, dependencies, compositions ...) but I suppose that's where the "kosher" considerations come into play, along with different BoM generators making different in-spec choices.
As CPE is deprecated in cyclonedx, I was also wondering if the store adapter would need a GetByPURL
method / if that would be feasible?
Thanks @wagoodman, we really appreciate the details you shared with us. We are gonna look at them and try to come up with more questions. In the meantime, I think @xtreme-conor-nosal (one of our engineers) had a question, if you could help him ππ» :).
@xtreme-conor-nosal Based on my understanding and looking at @wagoodman draft PR https://github.com/anchore/syft/pull/578/files#diff-e7c9d669021517e45a99d7b97c892cd108728b70121931f94931a6f45882e2c6R29, I guess the answer would be affirmative.
How could I have forgotten about providing CPEs on each package (artifacts[].cpes
) for matching against NVD --thanks @xtreme-conor-nosal for that addition π
I think that GetByPURL
on the store adapter is a great idea! There is a rough 1:1 match of package types to matcher ecosystems, so I don't think that will be an issue. I think the larger problem would be matching against NVD, which typically needs an accurate product and vendor to get relevant matches. That is --since NVD is indexed by CPEs we still need a good CPE as input to start the matching process, which hints at generating CPEs from a given pURL. This might be a good enhancement in syft that can be exported for use in grype by the new proposed GetByPURL
method.
A compromise here when it comes to getting the feature in (enabling input from SBOMs in different formats) and the quality of the vulnerability matching... there is a good chance that matching will predictably be lower for other formats until we better understand how the information can be encoded in the other SBOM formats (SPDX and CycloneDX). In order to get some forward progress here, I think it makes sense to log warnings in cases where we know the matching could be lower. That is, we shouldn't require on-par matching quality to that of the Syft JSON input in order for this issue to be complete.
@wagoodman yes, it makes sense to me. I am happy to get involved on a WG to investigate what we need from these formats to get the same or better accuracy.
Notes from an offline conversation with @wagoodman
In order to support import of alternative SBOM formats, we will need a particular set of data for Grype to perform ideal matching: package name, package version, package vendor, OS distribution, and CPEs (I may have missed something here). However, it is likely that imported SBOMs will be missing some of this information and we'd still like for Grype to be as useful as possible. Some ideas for handling this are:
package name & package version: this is pretty much required and we will simply return an error if this isn't provided for anything
package vendor: being included somehow could contribute to CPE generation, if no CPEs are provided
CPEs if missing:
OS distribution:
In all of these cases, we should probably document how matching happens in the event of missing data and how a vendor could provide this information so Grype would have improved matching.
vendor should map to the author/publisher/supplier fields of a component (if provided)
For the distribution, is a version required, or is "distro family" a useful-enough hint? (e.g. if component purls are present and have a pkg:deb/debian
prefix as syft currently outputs)
I like the approach of doing the best with what we have, and log warnings if appropriate.
What would you like to be added: We are using grype but our generated SBOM files are not always generated by Syft. We'd like to understand what is needed to accept this standard format.
Why is this needed: To accept commonly used format for the generation of BOM.
Additional context:
We saw in the README that is part of the future plans. However the proliferation of tools creating SBOM files following the standard formats represents a blocker to continue using grype.