Expose probability of exploit (EPSS score) via the `GET /software` response

noahtalerman commented 2 years ago

Goal

As a Fleet user, I want to know the exploitability (EPSS score) for vulnerable software installed on my devices so that I can prioritize updating/patching the software that is most vulnerable to attack across my fleet.

Figma

Add probability of exploit (EPSS score) for vulnerable software: https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/?node-id=6454%3A262007

Tasks

1

[ ] Add a new epss_probability property to the GET /software response.
epss_probability is only available to paid users. This means that the value will always be set to null for Fleet Free users.
Add a new epss_probability field to order_key. This allows the user to sort software by epss_probability.
- Since each software item may have multiple vulnerabilities each with their own epss_probability, for each software item, we'll use the vulnerability with the highest epss_probability, to sort software items by probability of exploit.

{
  ...
  "software": [
    {
      ...
      "vulnerabilities": [
        {
          ...
          "vulnerabilities": [
            {
              "cve": "CVE-2015-20107",
              "details_link": "https://nvd.nist.gov/vuln/detail/CVE-2015-20107",
              "epss_probability": 0.9,
            }
          ],
          "hosts_count": 4
        },
        {
          "cve": "CVE-2021-3572",
          "details_link": "https://nvd.nist.gov/vuln/detail/CVE-2021-3572",
          "epss_probability": 0.3,
        }
      ],
      "hosts_count": 8
    }
  ]
}

2

[ ] Add a new epss_probability property to get the software array returned by GET /hosts/{id} and GET /hosts/identifier/{identifier} endpoints.

noahtalerman commented 2 years ago

@zwass passing this issue to you to complete specs.

@lukeheath can you please review the API wireframes?

michalnicp commented 2 years ago

I think we should also include the epss percentile, which is also readily available. From https://www.first.org/epss/articles/prob_percentile_bins

Currently, EPSS provides both a probability of observing exploitation activity in the next 30 days, and a percentile (a rank ordering of probabilities from highest to lowest)

On the other hand, percentiles communicate rank ordering, and therefore convey a localized context. That is, they better communicate the relative importance. Knowing a vulnerability is ranked in the 88th percentile (or top 12%), may be perceived very differently than seeing just the absolute probability of 10%. And so while a probability alone (10%) may be difficult to interpret for some, adding in percentiles (88th) may provide the relative context needed for well-informed prioritization decisions. This highlights a very important difference between probability and percentile for EPSS scores.

Also, to indicate that these values come from epss and to differentiate them from cvss scores, I would suggest we prefix them with epss_*. The probability_of_exploit is a value that changes daily and reflects the probability of exploit in the next 30 days. You could calculate a value based on 365 days if you wanted to. It may be good to include the number of days in the field name so that it is clear what the time range is.

{
  ...
  "software": [
    {
      ...
      "vulnerabilities": [
        {
          "cve": "CVE-2021-3572",
          "details_link": "https://nvd.nist.gov/vuln/detail/CVE-2021-3572",
          "epss_30_day": 0.3, // or epss_score
          "epss_percentile": 0.24
        }
      ],
      "hosts_count": 8
    },

lukeheath commented 2 years ago

@michalnicp Those changes make sense to me. In addition, let's call it details_url instead of details_link.

michalnicp commented 2 years ago

details_link is already returned as part of the API. Changing it would be breaking.

noahtalerman commented 2 years ago

And so while a probability alone (10%) may be difficult to interpret for some, adding in percentiles (88th) may provide the relative context needed for well-informed prioritization decisions.

Thanks for pointing us to the excerpt Michal. While this makes sense, this doesn't align with what we've heard from users/customers.

We'd like to help this customer and others achieve the use case linked above by making the minimal amount of change to the Fleet product (iterate).

I think this means that we can start with the probability of exploit and later maybe come back to adding the percentile.

to indicate that these values come from epss and to differentiate them from cvss scores, I would suggest we prefix them with epss_*

By calling these scores probability_of_exploit Fleet can assist the user by helping them understand what these scores mean.

We'd like both the UI and API to prioritize the meaning of these scores. This is because most users and customers are still gaining familiarity with what EPSS score even mean.

cc @lukeheath @michalnicp

noahtalerman commented 2 years ago

Luke: Be as explicit as possible when showing a score:

5458: epss_probability
5522: cvss_score
4351: cisa_known_exploit

Noah: For now we're just going to include the epss_probability and punt on the the epss_percentile

lukeheath commented 2 years ago

@michalnicp I have updated the issue description with the latest API spec. Please let me know if you have any questions or concerns.

lukeheath commented 2 years ago

@noahtalerman This ticket appears ready for estimation, so I've removed your assignment and move to the "Specified" column.

noahtalerman commented 2 years ago

I've removed your assignment and move to the "Specified" column.

Makes sense! Thanks :)

noahtalerman commented 2 years ago

EDIT: I'm moving this issue back to the "Designed" column so that Luke is able to review the change before estimation.

@lukeheath heads up, we'd like to make EPSS probability only available to paid users. More context on this decision is here: https://fleetdm.slack.com/archives/C02A8BRABB5/p1651681804596209

I think this means that the new epss_probability property will be set to null for Fleet Free users. I updated the issue to reflect this.

Note that this is the behavior we use for the team_id property in the GET /hosts response.

If you agree with the above API design, I think we can still bring this ticket to today's estimation.

If the API design needs more thought before estimating please feel free to move this issue back to the "Designed."

lukeheath commented 2 years ago

@noahtalerman Looks good to me! Moving back to specified.

noahtalerman commented 2 years ago

@michalnicp do we have a separate issue that specifies the data ingestion and migration task? (adding EPSS scores to Fleet's vulnerability database).

If so, can you please add a link to this issue here in the comment section?

michalnicp commented 2 years ago

We don't have a separate issue at the moment. I was planning on doing it as part of this issue, though I may split it up into multiple PRs.

noahtalerman commented 2 years ago

I was planning on doing it as part of this issue

Ah got it.

@michalnicp when you get the chance can you please file a separate issue? This way the Interface team can track the progress of this issue when working on the API and UI updates.

michalnicp commented 2 years ago

For premium users, update default order_key to epss_probability descending.

@noahtalerman @lukeheath Can this be handled on the frontend? Changing the default order based on the license may result in a confusing API.

michalnicp commented 2 years ago

MySQL treats NULL values as lower than any non-NULL value. If there is no epss data for a cve, we would currently return NULL, which would appear first when sorting by epss_probability ascending, or last when sorting by descending. Is this behaviour ok?

lukeheath commented 2 years ago

@michalnicp

Can this be handled on the frontend?

Good point! I've moved this functionality to the frontend ticket.

If there is no epss data for a cve, we would currently return NULL, which would appear first when sorting by epss_probability ascending, or last when sorting by descending.

Thanks for calling this out. This should be fine because default sort is descending, which means the null values will always be after the rows that contain values.

michalnicp commented 2 years ago

The GET /api/v1/fleet/hosts/{id} and GET /api/v1/fleet/hosts/identifier/{identifier} endpoints also return software. I assume the same api changes should be made there? @lukeheath

lukeheath commented 2 years ago

Good question. Figma only shows the software page, so let's confirm with @noahtalerman

noahtalerman commented 2 years ago

@michalnicp adding epss probability data to the GET /hosts/{id} makes sense to me.

I think it's likely we prioritize surfacing this data on the Host details page (consumer of GET /hosts/{id}) soon.

I think, as a rule, it makes sense to always make updates to both the GET /hosts/{id} and GET /hosts/identifier/{identifier} API routes. This way, the information returned is consistent. @lukeheath what do you think?

lukeheath commented 2 years ago

I think, as a rule, it makes sense to always make updates to both the GET /hosts/{id} and GET /hosts/identifier/{identifier} API routes.

@noahtalerman I think this will depend on product strategy. Because the /host/{id} endpoint is used by administrators, while GET /hosts/identifier/{identifer} is used by host users, it may be the case that we want to surface data to the administrator that we don't want to surface to the host user. Because of that I'd hesitate to say "as a rule", although most of the time it will be the case that we update both.

@michalnicp For this ticket, I've updated the specs to include adding this data to the GET /hosts{id} and GET /hosts/identifier/{identifier} endpoints. Thanks for pointing this out! This will save us from having to double back later when we reveal this data on the host pages.

noahtalerman commented 2 years ago

GET /hosts/identifier/{identifer} is used by host users

@lukeheath is this API route used by the user of a device?

I could totally be wrong, but I thought the /identifier/{identifier} is another endpoint used by the admin. The admin might use this endpoint to retrieve host vitals and when they know the hostname or some identifier (other than id).

It looks like the GET /device/{identifier} API route is used to surface information to the device user on the My device page.

Somewhat related: I don't think this^ endpoint is documented today.

lukeheath commented 2 years ago

@noahtalerman Thanks for the clarification. You are correct, the endpoint is GET /device/{identifier}. I will update the specs.

lukeheath commented 2 years ago

@noahtalerman I think I may have misunderstood the conversation here between /identifier/{identifier} and /device/{identifier}. I updated the specs to say we wanted epss_probability on /device/{identifier} but not /identifier/{identifier} (see spec item 2 above). Would you please confirm this is incorrect? If so, I'll follow up with Michal to determine how it was implemented.

noahtalerman commented 2 years ago

I updated the specs to say we wanted epss_probability on /device/{identifier} but not /identifier/{identifier}

@lukeheath I'm confirming that this is incorrect.

We'd like to add epss_probability to GET hosts/identifier/{identifier}. We don't want to add epss_probability to GET /device/{identifier}

This is because I'd expect the GET hosts/identifier/{identifier} API route to always return the same data as the GET /hosts/{id} API route.

lukeheath commented 2 years ago

@noahtalerman Confirmed with Michal that he implemented this the correct way and not the way it was spec'd. We're all good!

noahtalerman commented 2 years ago

Alright!

fleetdm / fleet