anchore / grype

A vulnerability scanner for container images and filesystems
Apache License 2.0
8.17k stars 528 forks source link

Add support for using VEX statements to filter/enrich match results #1365

Open wagoodman opened 1 year ago

wagoodman commented 1 year ago

Given a set of VEX statements, which represents status assessments relative to a vulnerability matched with a product, it would be ideal to filter grype results down to useful or novel results (removing results that have not_affected status values in a VEX statement). The primary motivator behind this is to attempt to reduce the result size when possible to help the user focus on the results that have a practical impact (and not spend time attempting to remediate non-issues).

One question might be: where should these VEX documents come from? There is a bit of a spectrum here, and I think that motivates a possible implementation path:

Enable grype to be able to...

  1. take a single document that may have one or more VEX statements and filter the results. This is a good first step since we could blindly take statements and apply them to all artifacts regardless to the scan target. (edit: implemented in https://github.com/anchore/grype/pull/1397 )
  2. take multiple VEX documents and filter the results. This has an added challenge of determining which documents apply to the artifact scanned. There are also secondary problems such as if given a directory with a large set of documents, are there cheap and easy ways to filter down to the correct set of files (do we need an index? or to build an index?)?
  3. pull down remote VEX documents given an explicit reference (e.g. a git repo URL) and filter the results using only applicable documents. The reference should not be vague, so the added challenge is less about discovery and more about authentication and caching concerns.
  4. discover VEX documents for a given input reference (e.g. alpine:latest). There are a lot of added challenges here but ultimately this would be the most "magical", requiring the least amount of user input to leverage vex documents. This added automagic-ness should not sacrifice security concerns to achieve this and ideally would require no additional configuration (not a requirement though). There are a lot of directions this could go in, so I'll leave this speculation for later.

The nice thing about this path is that we defer decisions about where these documents come from while working on the logistics of lining up the existing OpenVEX spec.

For 1 something like this could be the input:

$ grype myimage:tag --vex ./path/to/vexdoc.json

The same input could work for 2 and 3, where the argument might be a directory and we look for vex documents, or the argument could be a remote resource such as git@github.com:myorg/myvexrepo.git... I'm softly suggesting this initially to help set an initial direction, but consider none of this set in stone.

Side note: I think we should focus initial conversations and efforts just on 1 for now, but I wanted to at least get a vision going for later.

One question I have about this feature is could there be multiple modes in how you would use a vex document? The initial suggestion at the top of this issue is primarily as a filter, and I was thinking about suggesting --filter or something similar as a step one. However, vex documents could also be used as a source of vulnerabilities based off of the status field with a value of affected. This means that filter is potentially the wrong verb to use based off of potential future usage... so I fell back to specifying "what" is being input ("vex") instead of an operation on the CLI. (dev note: this is where we add new flags)

I'm assuming that either mode (filtering and adding) would be useful depending on the use case and not mutually exclusive. I tend to add config items instead of CLI flags/args when there are "knobs" like these for different use cases and a sensible default behavior. That being said adding vex.filter_not_affected and vex.add_affected configurables (GRYPE_VEX_FILTER_NOT_AFFECTED and GRYPE_VEX_ADD_AFFECTED env vars) would be nice, with a default to true for both (dev note: here's were we bind new config elements into the application config).

When it comes to the JSON output grype rarely drops match results when there are filters applied, instead they are partitioned into a separate output in the JSON format: matches and ignoredMatches . When we filter out results based on vex statements I think we should elect to put these matches into the ignoredMatches section, allowing the user to audit the total set of results found.

With each record we tend to capture "how" the match was made in the .matchDetails of the match object . So for example, a match made against the alpine:3.2 image might have a match that looks like this:

  {
   "vulnerability": {
    "id": "CVE-2023-0466",
    "dataSource": "https://nvd.nist.gov/vuln/detail/CVE-2023-0466",
    "namespace": "nvd:cpe",
    "severity": "Medium",
    ...
   },
   "relatedVulnerabilities": [],
   "matchDetails": [
    {
     "type": "cpe-match",
     "matcher": "apk-matcher",
     "searchedBy": {
      "namespace": "nvd:cpe",
      "cpes": [
       "cpe:2.3:a:openssl:openssl:1.0.2k-r0:*:*:*:*:*:*:*"
      ],
      "Package": {
       "name": "openssl",
       "version": "1.0.2k-r0"
      }
     },
     "found": {
      "vulnerabilityID": "CVE-2023-0466",
      "versionConstraint": ">= 1.0.2, < 1.0.2zh || >= 1.1.1, < 1.1.1u || >= 3.0.0, < 3.0.9 || >= 3.1.0, < 3.1.1 (unknown)",
      "cpes": [
       "cpe:2.3:a:openssl:openssl:*:*:*:*:*:*:*:*"
      ]
     }
    }
   ],
   "artifact": {
    "id": "11081b02f0e7cc1f",
    "name": "libcrypto1.0",
    "version": "1.0.2k-r0",
    "type": "apk",
    ...
   }
  }

Where the matchDetails show what we searchedBy (given the package details) and what elements contributed towards finding a match in the found section. I think the matchDetails field should be amended to account for when we add matches based purely on vex statements, so we can show our work in how the match was made like we do with all of our other matchers.

Similarly, when we ignore a match based on a vex statement we should also take note of the reason why it was ignored. Today we do this in the IgnoredMatch object, which is a superset of the Match object but additionally captures the ignore rules that apply to this match . Looking at how we express ignore rules, a question that comes to mind is "should we fix vex concepts into these ignore rules? or should we add something else? (or change how this works fundamentally?)"

Ok, I have more thoughts and questions around how might the UI get updated, should we refactor the workflow to account for filtering logic earlier in processing, and related topics... but this has gotten verbose, let me stop here for now and open up the floor.

CC: @luhring @jspeed-meyers @puerco

dlorenc commented 1 year ago

This sounds awesome, and I agree with the phased approach:

  1. take a single document that may have one or more VEX statements and filter the results. This is a good first step since we could blindly take statements and apply them to all artifacts regardless to the scan target.
  2. take multiple VEX documents and filter the results. This has an added challenge of determining which documents apply to the artifact scanned. There are also secondary problems such as if given a directory with a large set of documents, are there cheap and easy ways to filter down to the correct set of files (do we need an index? or to build an index?)?
  3. pull down remote VEX documents given an explicit reference (e.g. a git repo URL) and filter the results using only applicable documents. The reference should not be vague, so the added challenge is less about discovery and more about authentication and caching concerns.
  4. discover VEX documents for a given input reference (e.g. alpine:latest). There are a lot of added challenges here but ultimately this would be the most "magical", requiring the least amount of user input to leverage vex documents. This added automagic-ness should not sacrifice security concerns to achieve this and ideally would require no additional configuration (not a requirement though). There are a lot of directions this could go in, so I'll leave this speculation for later.

I have a lot of ideas for 3 and 4, but we can cross those bridges as we get there. From the Wolfi side, we're happy to act as guinea pigs and help design the download/discovery/caching/magic parts to make sure it works well with Grype and users get the magical experience without having to sacrifice security or control.

luhring commented 1 year ago

I love all of this. 😍

A couple of small thoughts:

  1. According to the spec, in addition to not_affected, we'd also want fixed to be filtered out from Grype's results. (See this spec link for the differentiation if it's helpful.)
  2. The notion of adding to Grype's results via the affected status is absolutely correct. In terms of the development plan, we may want to consider making this something we come back to right after solving for the filtering use case end-to-end, instead of at the same time as filtering. I think we'd want to tackle affected before going to items 3 and 4 above (e.g. magical discovery), but there might be enough to figure out with affected that it makes sense to complete a depth-first implementation without it first, and then come right back to it.
  3. I just want to "+1" the importance of the .ignoredMatches and .matchDetails consideration points. And I like your initial suggestions @wagoodman.

This is easily the Grype feature I'm most excited about.

puerco commented 11 months ago

This is great, I've been diving into the grype code over the past week and I think that I have a good grasp on what @wagoodman and @luhring are mentioning here. I'll write an initial patch to propose # 1 above (take a single document...)

puerco commented 11 months ago

OK, I opened https://github.com/anchore/grype/pull/1397 which implements item 1 :rocket:

sej7278 commented 8 months ago

is this only for container images, as with pkg:rpm/kernel@version as the product purl (from vexctl) i get this:

* unable to find matches against VEX sources: unable to find matches against VEX documents: 
checking matches against VEX data: reading product identifiers from context: 
source type not supported for VEX

which seems to come from https://github.com/anchore/grype/blob/main/grype/vex/openvex/implementation.go#L70

sethmlarson commented 7 months ago

This is awesome, I am excited that VEX is being integrated directly into scanning tools. This greatly helps fight back against the systematic false-positives that will only get worse as more automated tooling tries its best to automate something which isn't automatable.

I want to create and publish SBOMs for CPython, but want to do so in a way that allows our team to mark vulnerabilities in bundled dependencies (like OpenSSL, which we only use a small subset of features) as not affecting CPython so as not to cause alarm and increase demands on volunteers to make unnecessary security releases.

This is the architecture I am imagining for CPython's SBOM and VEX, all SBOM documents would be referencing a VEX document (potentially stored publicly on GitHub) so we're able to make statements about vulnerabilities in dependencies post-release without requiring everyone update their SBOMs.

Screenshot from 2023-11-22 10-18-55

CycloneDX currently has support for specifying VEX documents (via externalReferences of type vulnerability-assertion). I wasn't able to find a similar mechanism immediately when looking at SPDX.

Obviously right now we could (and will try to) tell everyone to use our VEX statements with our SBOMs, but I suspect that there will be a percentage of folks which don't do that and then we end up having to engage with them piecemeal regardless. Would be great if there was a way for it all to happen auto-magically.

Is this use-case covered in phase 3, and if not, can it be?

puerco commented 7 months ago

@sethmlarson I would love to talk more about it, we are working exactly on this on OpenVEX. One way to magically discover the documents is via the SBOM reference as you mentioned it, but I would love to talk about the other methods we are implementing and exploring support for other well known data storage locations. Do you want to transfer this as an issue in openvex/spec? We can continue the conversation there!

puerco commented 7 months ago

@sej7278 Yes, for now the only artifact type that can incorporate vex data are container images. But we are working to support more, can you explain your use case a little bit more?

sej7278 commented 7 months ago

@puerco I'm basically interested in RPMs.

So an SBOM generated from RPMs and SRPMs using syft, then vulnerability scanned using grype and linking to vex data to clarify if the package is actually vulnerable or not.

A general improvement on scans based on arbitrary version strings from binaries is the endgame.

szh commented 2 weeks ago

I'm confused as to the status of this feature. I'm trying to use it for container images, and there is already a --vex command line option, but it doesn't seem to be working (see #1836). Is it supposed to be implemented already or not?

szh commented 1 week ago

Never mind, it seems that the issue I'm having is due to a matching issue. I'll work on a fix and submit a PR soon.