Add ability to prioritize vulnerable software

noahtalerman commented 2 years ago

Problem

I'm a user managing thousands of macOS, Windows, and/or Linux hosts and I'm overwhelmed with completing the following goal:

Make sure the most important vulnerable software is updated/removed from my hosts.

It's hard to make progress on the goal because I don't know where I should start. What vulnerable software should be updated/removed first?

Goals

Know which vulnerable software should be updated/removed first.
- We'll accomplish this by adding the ability to know the probability of exploit (reported by FIRST.org/epss), CVSS base score (reported by NVD), and whether or not there's a known exploit (reported by CISA) for each vulnerability.
- Also, we'll order the Software table by probability of exploit.

Figma

Add probability of exploit (EPSS score) for vulnerable software: https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/?node-id=6454%3A262007
Add known exploits and severity (CVSS scores) for vulnerable software: https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/?node-id=6454%3A264563

Child issues

5380
5458
4351
5522
5585

docs

https://github.com/fleetdm/fleet/pull/7554

noahtalerman commented 2 years ago

Potential way to determine priority of updating installed software...

Year: (low priority)
- At least one vulnerability (CVE) with low severity (CVSS score)
Month: (medium priority)
- At least one vulnerability (CVE) with a CVSS score of "medium"
Week: (high priority)
- At least one CVE with a CVSS score of "high"
- OR:
- At least one CVE with a public exploit
Day: (immediate priority)
- At least one CVE with a CVSS score of "critical"
- OR:
- At least one CVE with a CVSS score ≥"high" AND a public exploit

noahtalerman commented 2 years ago

Feedback from Josh Brower:

“Risk” is context specific
Example where this measure of "Risk" would be less valuable: Microsoft exchange cluster
- Why? "Risk" depends on other factors such as configuration, port, level of access, etc.
- First step for these software items is to link directly to Microsoft page which explains the "Risk" factors in more detail
Example where this measure of "Risk" would be more valuable: Top 100 or so consumer apps
- Why? "Risk" depends on less factors

mikermcneil commented 2 years ago

Feedback from customers:

Ryan: Hey just a follow-up from the meeting... I had a quick thought on how we might want to approach the vulnerability ranking system. Since we're essentially working with ya'll to help define heuristics, I was thinking the below might be a good place to start. This could be scaled to your other customers as they could define their own metrics, weights, etc... 1) Define risk categories (Low, Medium, High, Critical) 2) Define metrics

Vulnerability CVSS score
Percentage of machines that have run this software in the last X days
LPE or RCE
Present in NVD
etc.. 3) Give weight to each metric (i.e. which matters the most) 4) Define thresholds for each metric (i.e. CVSS 0-2 LOW, 2-5 MED, 5-9 HIGH, 9+ CRIT) 5) Write algorithm to dynamically determine Fleet Risk Score So I think from our perspective ideally we'd just give you metrics, thresholds and weights and the algorithm should spit out a risk score. If that sounds good to you, we can prepare a doc with those things and send it over (even if we just start with 1-2 metrics, we can expand from there).

Tony Gauda 4 hours ago Ryan- thanks for the suggestion. @Noah Talerman @mikermcneil FYI

mikermcneil < 1 minute ago I like it! Thanks Ryan. Up to @Noah Talerman on how to integrate this into the next our strategy and wireframes. Looking forward to reviewing together. One thing that's coming to mind for me: from a UX perspective, I'd like to see us come up with variations (or some way of clearly distinguishing) "Low, Medium, High, Critical" risk scores versus what those words mean in CVSS-land.

I'll paste the above discussion in the issue so others in the community can participate.

mikermcneil commented 2 years ago

@noahtalerman One thing I really like about how you're thinking about risk scoring is basing it on patching timeframes / SLAs. Maybe this is "SLA"?

(will not go out of our way to patch)
(will patch within 12 months)
(will patch within 1 month)
(will patch within 1 week)
(will patch within 24h)

rymurph20 commented 2 years ago

I think we were having trouble creating some sort of distinction between Critical and High in the meeting. I would separate the two (briefly) like this...

Critical: Danger is imminent from remote attackers; drop everything and fix. (e.g. Log4Shell) High: Vulnerability is trivial to exploit and should be prioritized but danger isn't necessarily imminent from remote attackers (e.g. LPE like Dirty Pipe, PolKit)

mikermcneil commented 2 years ago

@rymurph20 Fair to say by that definition: "Critical" == ≤24h, "High" == 1 week?

Also: @cjwalton

I get it about not having conflicting namespace with CVSS on vuln severity. I don't know if you are familiar with Traffic Light Protocol, but I wonder if there is something similar for security severity that can be referenced. I could research that.

That could work. We could even just call it "SLA" and have there be 4 convention over configuration levels, starting out based on feedback from you and anyone else who chimes in (makes it easier to ship a working version more quickly.) Then over time, we let people configure what those SLAs mean for them to support more use cases.

noahtalerman commented 2 years ago

I think from our perspective ideally we'd just give you metrics, thresholds and weights and the algorithm should spit out a risk score

@rymurph20, this makes a lot of sense.

It seems like the aggregate risk score is really good at bubbling up the "riskiest" software/hosts. This seems helpful for answering the "What action can I take to make the biggest impact at reducing risk now?"

However, as discussed by Mike and Jason above, it seems like the risk score on its own doesn't seem to help the questions of "Which software/hosts can we wait to update until later this week or later this month?"

I could be very wrong about the above^

For now, in an attempt to address both of the above questions, we're taking the approach of adding vulnerable software versions into "Year," "Month," "Week," and "Day" categories:

Year: (not urgent)
- At least one vulnerability (CVE) with CVSS score of "low"
Month: (somewhat urgent)
- At least one CVE with a CVSS score of "medium"
Week: (urgent)
- At least one CVE with a CVSS score of "high"
Day: (very urgent)
- At least one CVE with a CVSS score of "critical"
- OR:
- At least one CVE with a known exploit

Then, Fleet is poised to present the above "Urgency" with additional context in the UI and API:

How many hosts is this vulnerable software detected on?
When did hosts last use the vulnerable software?
How many high severity vulns does the vulnerable software have?

Critical: Danger is imminent from remote attackers; drop everything and fix. (e.g. Log4Shell) High: Vulnerability is trivial to exploit and should be prioritized but danger isn't necessarily imminent from remote attackers (e.g. LPE like Dirty Pipe, PolKit)

@rymurph20, this distinction is super helpful. The "Day" urgency in the Figma wireframes I link to below is intended to account for the "drop everything and fix" scenario.

The "Week" urgency is intended to account for the "should be prioritized"

noahtalerman commented 2 years ago

@cjwalton and @rymurph20 when you get the chance, please take a look at the following Figma wireframes to see how the "Urgency" concept and additional context could be presented in Fleet: https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/?node-id=4764%3A179433

What are your thoughts on using the criteria defined in the above comment for "Urgency" and being presented additional context instead of an aggregate risk score?

Please feel free to add any feedback as comments in this issue :)

Please note that these wireframes are subject to change and further iteration.

noahtalerman commented 2 years ago

Goals that might be addressed in a later iteration:

Fleet only lets you know about software with "Month" urgency when it comes time to address. Why? I want to know what is still not fixed. What is outside of SLAs?
Fleet can lower the "Urgency" because most devices have a specific security feature enabled (ex. a specific registry value in Windows is enabled for most of my devices).

zwass commented 2 years ago

I've been learning about EPSS and I think we should strongly consider using it for prioritization of vulnerabilities.

There's some helpful discussion of how to present EPSS scores in https://www.first.org/epss/articles/prob_percentile_bins.

cjwalton commented 2 years ago

@zwass - I was not familiar with EPSS until I read your comment, but that is exactly how I think about how vulnerability management should be addressed. Thanks for surfacing this and yes - this is a super forward-thinking way that Fleet should look at this.

noahtalerman commented 2 years ago

@chiiph I'm passing this issue's assignment to you. Can you, or another engineering team member, please check out the feasibility for adding data like CVSS scores, known exploits, and EPSS scores to vulnerabilities in the GET /software API route ?

This way, product can then take this research and determine how we'd like to display this data.

Links to sources for this data are included in the "Data" section in this issue's description.

cc @zwass

noahtalerman commented 2 years ago

Heads up, I'm adding this issue to the LEGACY #g-product board so that the product team is aware that this research is in progress.

juan-fdz-hawa commented 2 years ago

@noahtalerman I put together a small doc talking about CVSS and EPSS - TLDR:

We have access to the CVSS scores via the NVD artifacts we use for mapping from CPEs to CVEs.
The folks at www.first.org produce a daily dataset with EPSS scores, we will need to download it, process it, and maybe create an SQLite DB or similar - but this can be done in an 'offline' fashion (similar to what we do to produce the cpe DB).
Even though EPSS scores make more sense and seem to be the future, CVSS scores still are the 'industry' standard, so we might want to use both.
CVSS scores and EPSS scores have different intrinsic meanings, the first one is kinda like a technical assessment score the latter is a risk score (the probability of a vulnerability being exploited). For example, if we look at viruses in the USA, Ebola will probably have a high CVSS score, but a low EPSS score (because there are no cases in the USA) - just something to keep in mind when presenting the info.
The value produced by different versions of the EPSS model will have different meanings.

Feel free to DM if you have any questions.

noahtalerman commented 2 years ago

@juan-fdz-hawa thank you for putting together that doc!

Even though EPSS scores make more sense and seem to be the future, CVSS scores still are the 'industry' standard, so we might want to use both.

We'd like Fleet to help the user determine what software is the most vulnerable.

This way, a Fleet user can patch the most vulnerable software first to achieve the goal of maintaining secure and compliant devices.

EPSS was created as a way to quantify the risk of a vulnerability so that it can be better prioritized

I think this means that determining a helpful way to surface the EPSS score will be more valuable than surfacing the CVSS score. @cjwalton and @rymurph20 what do you think about this?

For example, if we look at viruses in the USA, Ebola will probably have a high CVSS score, but a low EPSS score (because there are no cases in the USA)

This is an awesome analogy.

mikermcneil commented 2 years ago

Data point:

noahtalerman commented 2 years ago

Several takeaways following a conversation with a customer:

Being able to sort software by exploitability (EPSS score) and drill down to see the severity (CVSS score) of the vulnerabilities (CVEs) associated with the software will allow the customer to be 80% successful when prioritizing vulnerable software.
- First, Fleet will add the ability to sort software by exploitability (EPSS score).
- Then, Fleet will add the ability to see exploitability (EPSS score) and severity (CVSS) for each of a software's vulnerabilities (CVEs).
For the customer, bucketing based on priority/SLA (Day, Week, Month, Year) will differ across teams.
- For example, let's say a vulnerability is somewhat exploitable (0.1 EPSS score) and has a medium severity (CVSS score 6.0). A team managing severs that support financial services would want to update this vulnerability a lot faster (within the day potentially) than a team managing workstations (within the hour).
- Fleet will later come back to this bucketing/SLA after the exploitability (EPSS) and severity scores (CVSS) are exposed.

cc @cjwalton @rymurph20

noahtalerman commented 2 years ago

@zwass @chiiph @juan-fdz-hawa @michalnicp distilling the above feedback into a list of priorities here:

Add EPSS scores to Fleet's vulnerability database
- Unanswered question: Will all vulnerabilities (CVEs) have an EPSS score?
Add CVSS scores to Fleet's vulnerability database

Can a member from the Platform team please file issues to track the above items? I'm happy to answer any questions or discuss the above before this happens.

A member of the interface team will be responsible for filing the issues that track UI+API changes to expose this data.

juan-fdz-hawa commented 2 years ago

Unanswered question: Will all vulnerabilities (CVEs) have an EPSS score?

@noahtalerman No, if I remember correctly the EPSS dataset contains around 173k scores, and there are currently around 185k CVEs

noahtalerman commented 2 years ago

Thanks! Do I read the buckets this way?

Year: (low priority)

At least one vulnerability (CVE) with low severity (CVSS score)

"The user can wait for at least a year until updating?"...

Correct.

This was a first stab at seeing if the Fleet product can enhance a common “service-level agreement (SLA)” practice we saw users/customers applying to vuln management.

Example of this practice: An organization wants to know generally how successful it was at updating/patching vulnerable software over the course of the year. Sometimes folks call this “time to remediation.” Often, this time to remediation differs according to characteristics of the vuln (how server or impactful is this vuln).

The thinking is, eventually, Fleet buckets vulnerable software into something like “day,” “week,” and “year” priorities so that Fleet is able to tell you what the average time to remediation is for all vulnerable software in the day, week, month, and year buckets.

So, Fleet helps answer, were all vulnerable software items bucketed under “week” actually remediated in a week? If not, how close was the organization to accomplishing this.

noahtalerman commented 2 years ago

Moving the following research out the issue's description:

Notes

There seem to be a software/vulnerability-first and a device-first approach to achieving the above.

In Q2 2022, Fleet will focus on improvements that address the software/vulnerability-first approach.

Software/vulnerability-first

Organizations with the resources for a robust process of managing vulnerabilities seem the have the following goals. These organizations are typically large organizations with tens to hundreds of thousands of devices.

As a Fleet user, I want to...

be able to gather a list of vulnerabilities and sort by "Risk score" so that I know which software to update to make the biggest impact on reducing risk at my organization.
know if the vulnerability has a known exploit so that I can prioritize updating the software with this vulnerability first.

Device-first

The organizations that don't yet have the resources for a robust process of managing vulnerabilities seem the have the following goals. These organizations are typically small to medium sized organizations.

As a Fleet user, I want to...

be able to sort hosts by "Risk score" so that I can prioritize updating the software on hosts that have the most vulnerable software
be able to “drill down” into the detected vulnerability so that I can prioritize updating the software on hosts that have recently used it.
be able pivot to a list of hosts with a specific vulnerability so that I can prioritize updating the software that is frequently used in my organization.

Data

CVSS scores + known exploits

One way to determine priority of updating installed software is by combining CVSS scores (available in NVD) and known exploit data.

Buckets

Year: (low priority)
- At least one vulnerability (CVE) with low severity (CVSS score)
Month: (medium priority)
- At least one vulnerability (CVE) with a CVSS score of "medium"
Week: (high priority)
- At least one CVE with a CVSS score of "high"
- OR:
- At least one CVE with a known exploit
Day: (immediate priority)
- At least one CVE with a CVSS score of "critical"
- OR:
- At least one CVE with a known exploit
  - Issue tracking the addition known exploit data is here: #4351

EPSS score

Another way to determine priority of updating installed software is by using EPSS scores.

EPSS probabilities convey a global, or overall, sense of the threat of exploitation, while percentiles provide a relative, or localized, measure of threat.
Percentile values may change for a given subset of vulnerabilities. For example, when a user considers only those vulnerabilities relevant to her network environment, the percentile values will change.

Buckets

Urgent (fix within 1 week)
- Software has a vulnerability (CVE) with 10% or higher chance of exploitation.

zhumo commented 2 years ago

@noahtalerman Can you add to the vulnerability processing page in the docs mention that we now have CVSS EPSS and link to them?

fleetdm / fleet