Add vendor-provided CVSS scores to vulnerability match records where available

nightfurys commented 3 years ago

RHEL and third party feeds may sometimes contain vendor specific CVSS scores that are missing in grype (and maybe grype-db) Examples of CVSS score in RHEL feeds:

{
  "Vulnerability": {
    "CVSS": [
      {
        "base_metrics": {
          "base_score": 4.9,
          "base_severity": "Medium",
          "exploitability_score": 1.2,
          "impact_score": 3.6
        },
        "status": "verified",
        "vector_string": "CVSS:3.0/AV:N/AC:L/PR:H/UI:N/S:U/C:N/I:N/A:H",
        "version": "3.0"
      }
    ],
    "Link": "https://access.redhat.com/security/cve/CVE-2020-2901",
    "Name": "CVE-2020-2901",
    "NamespaceName": "rhel:8",
    "Severity": "Medium"
    ...
  }
}

Check with @nightfurys for examples of third party feeds

wagoodman commented 3 years ago

Today we have two columns in the grype sqlite DB where CVSS is stored:

https://github.com/anchore/grype-db/blob/main/pkg/db/vulnerability_metadata.go#L15-L20

type VulnerabilityMetadata struct {
    ...
    CvssV2       *Cvss    // Common Vulnerability Scoring System V2 values
    CvssV3       *Cvss    // Common Vulnerability Scoring System V3 values
}

type Cvss struct {
    BaseScore           float64 // Ranges from 0 - 10 and defines for qualities intrinsic to a vulnerability
    ExploitabilityScore float64 // Indicator of how easy it may be for an attacker to exploit a vulnerability
    ImpactScore         float64 // Representation of the effects of an exploited vulnerability relative to compromise in confidentiality, integrity, and availability
    Vector              string  // A textual representation of the metric values used to determine the score
}

I think we should adjust each CVSS column in VulnerabilityMetadata to be of type []Cvss and redefine Cvss to:

type Cvss struct {
    Tags    map[string]string
    Metrics CvssMetrics
    Vector  string  // A textual representation of the metric values used to determine the score
}

type CvssMetrics struct {
    BaseScore           float64 // Ranges from 0 - 10 and defines for qualities intrinsic to a vulnerability
    ExploitabilityScore float64 // Indicator of how easy it may be for an attacker to exploit a vulnerability
    ImpactScore         float64 // Representation of the effects of an exploited vulnerability relative to compromise in confidentiality, integrity, and availability
}

In this way we can allow for:

multiple CVSS entries without needing to add columns for each source
arbitrary key-value pairs to help identify the source of the cvss and different qualities, such as type=vendor, or source=rhel, without being restricted to specific keys being tied to the schema.

We can (optionally) drop the v2-v3 distinction and let this be handled by a tag (e.g. version=2). Then we would only need a single CVSS column.

luhring commented 3 years ago

I like the idea of not having new columns per vendor.

~Should the Vector field be part of the CvssMetrics struct? This doc comment makes me think it would be: "A textual representation of the metric values used to determine the score".~ I could see the other approach, actually. And I think subconsciously, I saw "Metrics" (plural) and assumed a collection. 😄

Re the Tags concept, I see what it's trying to solve. I'd love to find a solution that expresses as strict a schema as possible, and to understand the needs for what data we'll need to store — if possible.

My two cents on the specifics of this comment:

type (such as vendor) — this seems like a Feeds-service-ism. Is the same distinction useful in our domain? If so, why exactly?

source (such as rhel) — I see obvious benefit to this. I'd suggest promoting this to a named field in a data structure. I think it's really important (and will also unlock better use of our output data) to attribute a given CVSS scoring result to its creator.

wagoodman commented 3 years ago

I thought about tags being replaced with a struct, but wasn't certain of the keys... specifically, across the various sources that provide cvss, it wasn't clear to me that we could select keys that would universally apply to all sources.

Also, we don't use the cvss scores for matching, only for presentation. What will change about this is that engine will be using this DB directly as well for its needs (we now share a common domain with engine, we are no longer distinct here).

I think of the tags like k8s labels and annotations; they are arbitrary key-values that a consumer can use for it's needs, but the definition is flexible enough to not require codifying the keys into the type.

re: type and source, these were provided as an example to illustrate how it could be used. We need to look at sources that provide cvss data to select key-values that make sense for each source.

luhring commented 3 years ago

we don't use the cvss scores for matching, only for presentation

This resonates. I also see us using these scores when we get more into filtering down the road (in Grype, not Engine).

As for the information that would be stored in tags, I think it's worth spending a small amount of time ascertaining the needs here. E.g.: who are the consumers of this data right now? Maybe it's just Engine at the moment. What are Engine's needs?

I like the idea of extensibility. But my take is: guessing at future needs is a hard game to win. I could see it being safe and manageable to start with the fields we know we need now, and making nonbreaking changes down the road when (or if) we discover a need for additional field(s).

I'm also not totally against the "tags" approach — I'm just not quite understanding how that level of generalization is required at this point in time.

wagoodman commented 3 years ago

righto @luhring , I hear your concerns here; lets move forward with determining the final answer while working on this (my example is meant to be suggestive but not finalized).

alfredodeza commented 3 years ago

For additional clarity in what the differences are in vendor-specific CVSS:

RHEL

   "CVSS": [
    {
     "base_metrics": {
      "base_score": 6.5,
      "base_severity": "Medium",
      "exploitability_score": 2.8,
      "impact_score": 3.6
     },
     "status": "draft",
     "vector_string": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H",
     "version": "3.1"
    }
   ]

MSRC

  "cvss": {
   "base_score": 8.1,
   "temporal_score": 7.3,
   "vector": "CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H/E:P/RL:O/RC:C"
  },

Debian and Ubuntu

Has no CVSS reported, they all come from NVD

    "NVD": {
     "CVSSv2": {
      "Score": 4.6,
      "Vectors": "AV:L/AC:L/Au:N/C:P/I:P/A:P"
     }
    }

VulnDB

Exposes NVD but also vendor_cvss scores. @nightfurys mind clarifying what are the differences between these and if you are looking to capture them all in this case? (probably offline conversation)

NVD

  "cvss_v2": {
   "additional_information": {
    "ac_insuf_info": false,
    "obtain_all_privilege": false,
    "obtain_other_privilege": false,
    "obtain_user_privilege": false,
    "user_interaction_required": false
   },
   "base_metrics": {
    "access_complexity": "LOW",
    "access_vector": "NETWORK",
    "authentication": "NONE",
    "availability_impact": "PARTIAL",
    "base_score": 7.5,
    "confidentiality_impact": "PARTIAL",
    "exploitability_score": 10,
    "impact_score": 6.4,
    "integrity_impact": "PARTIAL"
   },
   "severity": "High",
   "vector_string": "AV:N/AC:L/Au:N/C:P/I:P/A:P",
   "version": "2.0"
  },
  "cvss_v3": {
   "base_metrics": {
    "attack_complexity": "LOW",
    "attack_vector": "NETWORK",
    "availability_impact": "HIGH",
    "base_score": 9.8,
    "base_severity": "Critical",
    "confidentiality_impact": "HIGH",
    "exploitability_score": 3.9,
    "impact_score": 5.9,
    "integrity_impact": "HIGH",
    "privileges_required": "NONE",
    "scope": "UNCHANGED",
    "user_interaction": "NONE"
   },
   "vector_string": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
   "version": "3.1"
  },

nightfurys commented 3 years ago

A bit of context regarding payload format and content available from feed service. Feed service data is laid out in a hierarchical fashion - feed types and feed groups. ancho.re serves the following feed types

$ curl -ks https://ancho.re/v1/service/feeds
{
  "feeds": [
    {
      "access_tier": 0,
      "description": "Feed record for type vulnerabilities",
      "name": "vulnerabilities"
    },
    {
      "access_tier": 0,
      "description": "Feed record for type github",
      "name": "github"
    },
    {
      "access_tier": 0,
      "description": "Feed record for type nvdv2",
      "name": "nvdv2"
    },
    ...

Each feed type contains one or more groups

$ curl -ks https://ancho.re/v1/service/feeds/vulnerabilities
{
  "groups": [
    {
      "access_tier": 0,
      "description": "Group record for namespace: alpine:3.10 and feed type: vulnerabilities",
      "name": "alpine:3.10"
    },
    {
      "access_tier": 0,
      "description": "Group record for namespace: alpine:3.11 and feed type: vulnerabilities",
      "name": "alpine:3.11"
    },
    {
      "access_tier": 0,
      "description": "Group record for namespace: alpine:3.12 and feed type: vulnerabilities",
      "name": "alpine:3.12"
    },
    ...

Expect the data format for groups to be consistent with in a feed type i.e. all groups within vulnerabilities feed type will all have the same payload format. That's a feed service contract (not clearly documented anywhere).

Payload schema for vulnerabilities feed type:

{
  "type": "object",
  "properties": {
    "Vulnerability": {
      "type": "object",
      "properties": {
        "CVSS": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "base_metrics": {
                "type": "object",
                "properties": {
                  "base_score": {
                    "type": "number"
                  },
                  "base_severity": {
                    "type": "string"
                  },
                  "exploitability_score": {
                    "type": "number"
                  },
                  "impact_score": {
                    "type": "number"
                  }
                }
              },
              "status": {
                "type": "string"
              },
              "vector_string": {
                "type": "string"
              },
              "version": {
                "type": "string"
              }
            }
          }
        },
        "Description": {
          "type": "string"
        },
        "FixedIn": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "Name": {
                "type": "string"
              },
              "NamespaceName": {
                "type": "string"
              },
              "VendorAdvisory": {
                "type": "object",
                "properties": {
                  "NoAdvisory": {
                    "type": "boolean"
                  },
                  "AdvisorySummary": {
                    "type": "array",
                    "items": {
                      "type": "object",
                      "properties": {
                        "ID": {
                          "type": "string"
                        },
                        "Link": {
                          "type": "string"
                        }
                      }
                    }
                  }
                }
              },
              "Version": {
                "type": "string"
              },
              "VersionFormat": {
                "type": "string"
              }
            }
          }
        },
        "Link": {
          "type": "string"
        },
        "Metadata": {
          "type": "object"
        },
        "Name": {
          "type": "string"
        },
        "NamespaceName": {
          "type": "string"
        },
        "Severity": {
          "type": "string"
        }
      }
    }
  }
}

Note that the all the data here is specific to the vendor supplying the data i.e. a payload in rhel:8 group will reflect CVSS scores from the data source used to construct the rest of the payload. This is often referred to as vendor score

nightfurys commented 3 years ago

The data format for nvdv2 feed type is different from the vulnerabilities feed type. Highlighting the cvss score attributes only with yaml schema for easier readability

type: object
properties:
  cvss_v2:
    type: object
    properties:
      version:
        type: string
      vector_string:
        type: string
      severity:
        type: string
      base_metrics:
        type: object
        properties:
          access_vector:
            type: string
          access_complexity:
            type: string
          authentication:
            type: string
          confidentiality_impact:
            type: string
          integrity_impact:
            type: string
          availability_impact:
            type: string
          base_score:
            type: number
          exploitability_score:
            type: number
          impact_score:
            type: number
      additional_information:
        type: object
        properties:
          ac_insuf_info:
            type: boolean
          obtain_all_privilege:
            type: boolean
          obtain_user_privilege:
            type: boolean
          obtain_other_privilege:
            type: boolean
          user_interaction_required:
            type: boolean
  cvss_v3:
    type: object
    properties:
      version:
        type: string
      vector_string:
        type: string
      base_metrics:
        type: object
        properties:
          attack_vector:
            type: string
          attack_complexity:
            type: string
          privileges_required:
            type: string
          user_interaction:
            type: string
          scope:
            type: string
          confidentiality_impact:
            type: string
          integrity_impact:
            type: string
          availability_impact:
            type: string
          base_score:
            type: number
          exploitability_score:
            type: number
          impact_score:
            type: number
          base_severity:
            type: string
  ...

alfredodeza commented 3 years ago

@nightfurys you mention that "third party feeds may sometimes contain vendor specific CVSS scores", but it isn't clear for me which are those third party feeds that contain vendor specific CVSS scores. When you say "sometimes", is that a known field in the feed JSON response? Or how can I tell what feed has a custom CVSS score vs. one that doesn't?

wagoodman commented 3 years ago

Completed in https://github.com/anchore/grype/pull/317

anchore / grype