Closed nightfurys closed 3 years ago
Today we have two columns in the grype sqlite DB where CVSS is stored:
https://github.com/anchore/grype-db/blob/main/pkg/db/vulnerability_metadata.go#L15-L20
type VulnerabilityMetadata struct {
...
CvssV2 *Cvss // Common Vulnerability Scoring System V2 values
CvssV3 *Cvss // Common Vulnerability Scoring System V3 values
}
type Cvss struct {
BaseScore float64 // Ranges from 0 - 10 and defines for qualities intrinsic to a vulnerability
ExploitabilityScore float64 // Indicator of how easy it may be for an attacker to exploit a vulnerability
ImpactScore float64 // Representation of the effects of an exploited vulnerability relative to compromise in confidentiality, integrity, and availability
Vector string // A textual representation of the metric values used to determine the score
}
I think we should adjust each CVSS column in VulnerabilityMetadata
to be of type []Cvss
and redefine Cvss
to:
type Cvss struct {
Tags map[string]string
Metrics CvssMetrics
Vector string // A textual representation of the metric values used to determine the score
}
type CvssMetrics struct {
BaseScore float64 // Ranges from 0 - 10 and defines for qualities intrinsic to a vulnerability
ExploitabilityScore float64 // Indicator of how easy it may be for an attacker to exploit a vulnerability
ImpactScore float64 // Representation of the effects of an exploited vulnerability relative to compromise in confidentiality, integrity, and availability
}
In this way we can allow for:
type=vendor
, or source=rhel
, without being restricted to specific keys being tied to the schema. We can (optionally) drop the v2-v3 distinction and let this be handled by a tag (e.g. version=2
). Then we would only need a single CVSS column.
I like the idea of not having new columns per vendor.
~Should the Vector
field be part of the CvssMetrics
struct? This doc comment makes me think it would be: "A textual representation of the metric values used to determine the score".~ I could see the other approach, actually. And I think subconsciously, I saw "Metrics" (plural) and assumed a collection. 😄
Re the Tags
concept, I see what it's trying to solve. I'd love to find a solution that expresses as strict a schema as possible, and to understand the needs for what data we'll need to store — if possible.
My two cents on the specifics of this comment:
type
(such as vendor
) — this seems like a Feeds-service-ism. Is the same distinction useful in our domain? If so, why exactly?
source
(such as rhel
) — I see obvious benefit to this. I'd suggest promoting this to a named field in a data structure. I think it's really important (and will also unlock better use of our output data) to attribute a given CVSS scoring result to its creator.
I thought about tags
being replaced with a struct, but wasn't certain of the keys... specifically, across the various sources that provide cvss, it wasn't clear to me that we could select keys that would universally apply to all sources.
Also, we don't use the cvss scores for matching, only for presentation. What will change about this is that engine will be using this DB directly as well for its needs (we now share a common domain with engine, we are no longer distinct here).
I think of the tags like k8s labels and annotations; they are arbitrary key-values that a consumer can use for it's needs, but the definition is flexible enough to not require codifying the keys into the type.
re: type
and source
, these were provided as an example to illustrate how it could be used. We need to look at sources that provide cvss data to select key-values that make sense for each source.
we don't use the cvss scores for matching, only for presentation
This resonates. I also see us using these scores when we get more into filtering down the road (in Grype, not Engine).
As for the information that would be stored in tags, I think it's worth spending a small amount of time ascertaining the needs here. E.g.: who are the consumers of this data right now? Maybe it's just Engine at the moment. What are Engine's needs?
I like the idea of extensibility. But my take is: guessing at future needs is a hard game to win. I could see it being safe and manageable to start with the fields we know we need now, and making nonbreaking changes down the road when (or if) we discover a need for additional field(s).
I'm also not totally against the "tags" approach — I'm just not quite understanding how that level of generalization is required at this point in time.
righto @luhring , I hear your concerns here; lets move forward with determining the final answer while working on this (my example is meant to be suggestive but not finalized).
For additional clarity in what the differences are in vendor-specific CVSS:
"CVSS": [
{
"base_metrics": {
"base_score": 6.5,
"base_severity": "Medium",
"exploitability_score": 2.8,
"impact_score": 3.6
},
"status": "draft",
"vector_string": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H",
"version": "3.1"
}
]
"cvss": {
"base_score": 8.1,
"temporal_score": 7.3,
"vector": "CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H/E:P/RL:O/RC:C"
},
Has no CVSS reported, they all come from NVD
"NVD": {
"CVSSv2": {
"Score": 4.6,
"Vectors": "AV:L/AC:L/Au:N/C:P/I:P/A:P"
}
}
Exposes NVD but also vendor_cvss scores. @nightfurys mind clarifying what are the differences between these and if you are looking to capture them all in this case? (probably offline conversation)
"cvss_v2": {
"additional_information": {
"ac_insuf_info": false,
"obtain_all_privilege": false,
"obtain_other_privilege": false,
"obtain_user_privilege": false,
"user_interaction_required": false
},
"base_metrics": {
"access_complexity": "LOW",
"access_vector": "NETWORK",
"authentication": "NONE",
"availability_impact": "PARTIAL",
"base_score": 7.5,
"confidentiality_impact": "PARTIAL",
"exploitability_score": 10,
"impact_score": 6.4,
"integrity_impact": "PARTIAL"
},
"severity": "High",
"vector_string": "AV:N/AC:L/Au:N/C:P/I:P/A:P",
"version": "2.0"
},
"cvss_v3": {
"base_metrics": {
"attack_complexity": "LOW",
"attack_vector": "NETWORK",
"availability_impact": "HIGH",
"base_score": 9.8,
"base_severity": "Critical",
"confidentiality_impact": "HIGH",
"exploitability_score": 3.9,
"impact_score": 5.9,
"integrity_impact": "HIGH",
"privileges_required": "NONE",
"scope": "UNCHANGED",
"user_interaction": "NONE"
},
"vector_string": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
"version": "3.1"
},
A bit of context regarding payload format and content available from feed service. Feed service data is laid out in a hierarchical fashion - feed types and feed groups. ancho.re serves the following feed types
$ curl -ks https://ancho.re/v1/service/feeds
{
"feeds": [
{
"access_tier": 0,
"description": "Feed record for type vulnerabilities",
"name": "vulnerabilities"
},
{
"access_tier": 0,
"description": "Feed record for type github",
"name": "github"
},
{
"access_tier": 0,
"description": "Feed record for type nvdv2",
"name": "nvdv2"
},
...
Each feed type contains one or more groups
$ curl -ks https://ancho.re/v1/service/feeds/vulnerabilities
{
"groups": [
{
"access_tier": 0,
"description": "Group record for namespace: alpine:3.10 and feed type: vulnerabilities",
"name": "alpine:3.10"
},
{
"access_tier": 0,
"description": "Group record for namespace: alpine:3.11 and feed type: vulnerabilities",
"name": "alpine:3.11"
},
{
"access_tier": 0,
"description": "Group record for namespace: alpine:3.12 and feed type: vulnerabilities",
"name": "alpine:3.12"
},
...
Expect the data format for groups to be consistent with in a feed type i.e. all groups within vulnerabilities feed type will all have the same payload format. That's a feed service contract (not clearly documented anywhere).
Payload schema for vulnerabilities feed type:
{
"type": "object",
"properties": {
"Vulnerability": {
"type": "object",
"properties": {
"CVSS": {
"type": "array",
"items": {
"type": "object",
"properties": {
"base_metrics": {
"type": "object",
"properties": {
"base_score": {
"type": "number"
},
"base_severity": {
"type": "string"
},
"exploitability_score": {
"type": "number"
},
"impact_score": {
"type": "number"
}
}
},
"status": {
"type": "string"
},
"vector_string": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
},
"Description": {
"type": "string"
},
"FixedIn": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Name": {
"type": "string"
},
"NamespaceName": {
"type": "string"
},
"VendorAdvisory": {
"type": "object",
"properties": {
"NoAdvisory": {
"type": "boolean"
},
"AdvisorySummary": {
"type": "array",
"items": {
"type": "object",
"properties": {
"ID": {
"type": "string"
},
"Link": {
"type": "string"
}
}
}
}
}
},
"Version": {
"type": "string"
},
"VersionFormat": {
"type": "string"
}
}
}
},
"Link": {
"type": "string"
},
"Metadata": {
"type": "object"
},
"Name": {
"type": "string"
},
"NamespaceName": {
"type": "string"
},
"Severity": {
"type": "string"
}
}
}
}
}
Note that the all the data here is specific to the vendor supplying the data i.e. a payload in rhel:8 group will reflect CVSS scores from the data source used to construct the rest of the payload. This is often referred to as vendor score
The data format for nvdv2 feed type is different from the vulnerabilities feed type. Highlighting the cvss score attributes only with yaml schema for easier readability
type: object
properties:
cvss_v2:
type: object
properties:
version:
type: string
vector_string:
type: string
severity:
type: string
base_metrics:
type: object
properties:
access_vector:
type: string
access_complexity:
type: string
authentication:
type: string
confidentiality_impact:
type: string
integrity_impact:
type: string
availability_impact:
type: string
base_score:
type: number
exploitability_score:
type: number
impact_score:
type: number
additional_information:
type: object
properties:
ac_insuf_info:
type: boolean
obtain_all_privilege:
type: boolean
obtain_user_privilege:
type: boolean
obtain_other_privilege:
type: boolean
user_interaction_required:
type: boolean
cvss_v3:
type: object
properties:
version:
type: string
vector_string:
type: string
base_metrics:
type: object
properties:
attack_vector:
type: string
attack_complexity:
type: string
privileges_required:
type: string
user_interaction:
type: string
scope:
type: string
confidentiality_impact:
type: string
integrity_impact:
type: string
availability_impact:
type: string
base_score:
type: number
exploitability_score:
type: number
impact_score:
type: number
base_severity:
type: string
...
@nightfurys you mention that "third party feeds may sometimes contain vendor specific CVSS scores", but it isn't clear for me which are those third party feeds that contain vendor specific CVSS scores. When you say "sometimes", is that a known field in the feed JSON response? Or how can I tell what feed has a custom CVSS score vs. one that doesn't?
Completed in https://github.com/anchore/grype/pull/317
RHEL and third party feeds may sometimes contain vendor specific CVSS scores that are missing in grype (and maybe grype-db) Examples of CVSS score in RHEL feeds:
Check with @nightfurys for examples of third party feeds