anchore / grype

A vulnerability scanner for container images and filesystems
Apache License 2.0
8.15k stars 528 forks source link

Prefer direct match information over indirect matches #1931

Open wagoodman opened 3 weeks ago

wagoodman commented 3 weeks ago

In the case where both a direct match and indirect match are made for the same package and vulnerability ID, today we have two matches:

cat sbom-maven.json |  grype -o json | jq '.matches[] | select(.artifact.name == "perl-Errno" and .vulnerability.id == "ELSA-2021-1678") | { "name": .artifact.name, "version": .artifact.version, "vuln": .vulnerability.id, "fix": .vulnerability.fix.versions , "match-type": .matchDetails[].type}'
{
  "name": "perl-Errno",
  "version": "0:1.28-417.el8_3",
  "vuln": "ELSA-2021-1678",
  "fix": [
    "0:1.28-419.el8"
  ],
  "match-type": "exact-direct-match"
}
{
  "name": "perl-Errno",
  "version": "0:1.28-417.el8_3",
  "vuln": "ELSA-2021-1678",
  "fix": [
    "4:5.26.3-419.el8"
  ],
  "match-type": "exact-indirect-match"
}

However, this is probably too much information, as the fix information is probably most accurate on the direct match anyway.

We could merge these similar matches into a single match, preferring the direct match, while still including the match details for the indirect match in the matchDetails array.

This would, in effect, result in this (just to illustrate the approximate change):

cat sbom-maven.json |  grype -o json | jq '.matches[] | select(.artifact.name == "perl-Errno" and .vulnerability.id == "ELSA-2021-1678") | { "name": .artifact.name, "version": .artifact.version, "vuln": .vulnerability.id, "fix": .vulnerability.fix.versions , "match-type": .matchDetails[].type}'
{
  "name": "perl-Errno",
  "version": "0:1.28-417.el8_3",
  "vuln": "ELSA-2021-1678",
  "fix": [
    "0:1.28-419.el8"
  ],
  "match-type": ["exact-direct-match", "exact-indirect-match"]
}

Note that:

(the SBOM was produced from maven@sha256:1ffe2b51b6762b94590a1149cf0c35a169203d467dc34891be1439ad3b54940e)

dev notes:

zhill commented 3 weeks ago

Creating an array of match types seems confusing, particularly in cases where things like the fix info could be different in the matched records. Doesn't the match logic first do directs, then indirect matching? That seems like a natural place to prune duplicate indirect matches since those are by definition less accurate to the package being matched (e.g. the fix version may not even be something that the actual package can be upgraded to).

I understand that there are likely some strange corner cases around multiple-fixes for the same CVE over time, but most OS vendors don't publish vuln records only for a source package if the binary packages haven't been rebuilt on that yet, so I think those kinds of cases are likely to be very rare.

westonsteimel commented 3 weeks ago

Oh, we also need to do the filtering based off of direct match prior to filtering out matches based on affected version constraint otherwise you'd end up with cases where a package may appear vulnerable to the indirect match version range but was not vulnerable to the direct match range

wagoodman commented 3 weeks ago

Creating an array of match types seems confusing...

yeah, the JQ expression above was really a summarization of the effect. The goal isn't to create an array of match types, but to instead leverage the existing matchDetails[] array. So for example:

[
  {
    "vulnerability": {
      "id": "ELSA-2021-1678",
      "dataSource": "https://linux.oracle.com/errata/ELSA-2021-1678.html",
      "namespace": "oracle:distro:oraclelinux:8",
      "severity": "Medium",
      "fix": {
        "versions": [
          "0:1.28-419.el8"
        ],
        "state": "fixed"
      },
      "advisories": []
    },
    "matchDetails": [
      {
        "type": "exact-direct-match",
        "matcher": "rpm-matcher",
        "searchedBy": {
          "distro": {
            "type": "oraclelinux",
            "version": "8.3"
          },
          "namespace": "oracle:distro:oraclelinux:8",
          "package": {
            "name": "perl-Errno",
            "version": "0:1.28-417.el8_3"
          }
        },
        "found": {
          "versionConstraint": "< 0:1.28-419.el8 (rpm)",
          "vulnerabilityID": "ELSA-2021-1678"
        }
      }
    ],
    "artifact": {
      "id": "5bc7f8ad86b036d5",
      "name": "perl-Errno",
      "version": "0:1.28-417.el8_3",
      "type": "rpm",
      "purl": "pkg:rpm/ol/perl-Errno@1.28-417.el8_3?arch=x86_64&epoch=0&upstream=perl-5.26.3-417.el8_3.src.rpm&distro=ol-8.3",
      "upstreams": [
        {
          "name": "perl",
          "version": "5.26.3-417.el8_3"
        }
      ]
    }
  },
  {
    "vulnerability": {
      "id": "ELSA-2021-1678",
      "dataSource": "https://linux.oracle.com/errata/ELSA-2021-1678.html",
      "namespace": "oracle:distro:oraclelinux:8",
      "severity": "Medium",
      "fix": {
        "versions": [
          "4:5.26.3-419.el8"
        ],
        "state": "fixed"
      },
      "advisories": []
    },
    "matchDetails": [
      {
        "type": "exact-indirect-match",
        "matcher": "rpm-matcher",
        "searchedBy": {
          "distro": {
            "type": "oraclelinux",
            "version": "8.3"
          },
          "namespace": "oracle:distro:oraclelinux:8",
          "package": {
            "name": "perl",
            "version": "5.26.3-417.el8_3"
          }
        },
        "found": {
          "versionConstraint": "< 4:5.26.3-419.el8 (rpm)",
          "vulnerabilityID": "ELSA-2021-1678"
        }
      }
    ],
    "artifact": {
      "id": "5bc7f8ad86b036d5",
      "name": "perl-Errno",
      "version": "0:1.28-417.el8_3",
      "type": "rpm",
      "purl": "pkg:rpm/ol/perl-Errno@1.28-417.el8_3?arch=x86_64&epoch=0&upstream=perl-5.26.3-417.el8_3.src.rpm&distro=ol-8.3",
      "upstreams": [
        {
          "name": "perl",
          "version": "5.26.3-417.el8_3"
        }
      ]
    }
  }
]

Would be summarized to:

[
  {
    "vulnerability": {
      "id": "ELSA-2021-1678",
      "dataSource": "https://linux.oracle.com/errata/ELSA-2021-1678.html",
      "namespace": "oracle:distro:oraclelinux:8",
      "severity": "Medium",
      "fix": {
        "versions": [
          "0:1.28-419.el8"
        ],
        "state": "fixed"
      },
      "advisories": []
    },
    "matchDetails": [
      {
        "type": "exact-direct-match",
        "matcher": "rpm-matcher",
        "searchedBy": {
          "distro": {
            "type": "oraclelinux",
            "version": "8.3"
          },
          "namespace": "oracle:distro:oraclelinux:8",
          "package": {
            "name": "perl-Errno",
            "version": "0:1.28-417.el8_3"
          }
        },
        "found": {
          "versionConstraint": "< 0:1.28-419.el8 (rpm)",
          "vulnerabilityID": "ELSA-2021-1678"
        }
      },
      {
        "type": "exact-indirect-match",
        "matcher": "rpm-matcher",
        "searchedBy": {
          "distro": {
            "type": "oraclelinux",
            "version": "8.3"
          },
          "namespace": "oracle:distro:oraclelinux:8",
          "package": {
            "name": "perl",
            "version": "5.26.3-417.el8_3"
          }
        },
        "found": {
          "versionConstraint": "< 4:5.26.3-419.el8 (rpm)",
          "vulnerabilityID": "ELSA-2021-1678"
        }
      }
    ],
    "artifact": {
      "id": "5bc7f8ad86b036d5",
      "name": "perl-Errno",
      "version": "0:1.28-417.el8_3",
      "type": "rpm",
      "purl": "pkg:rpm/ol/perl-Errno@1.28-417.el8_3?arch=x86_64&epoch=0&upstream=perl-5.26.3-417.el8_3.src.rpm&distro=ol-8.3",
      "upstreams": [
        {
          "name": "perl",
          "version": "5.26.3-417.el8_3"
        }
      ]
    }
  }
]

One match with multiple match details, just as we do today.

zhill commented 3 weeks ago

I'm still not sure we have a good justification for showing both indirect and direct matches for the same vulnerability in the same namespace. Is there are case where we need to do that because it helps the user make a different decision?

wagoodman commented 2 weeks ago

When we make a match we try not to drop it unless we're sure it's wrong or the user has an ignore rule for it -- neither really apply in this case. The upside of including multiple match details is that it can be used as input into a confidence value for this merged match (not implemented yet), where the more ways we reach the same match then the more confident we are of the match as a whole.