The semantics of CPE are insufficiently defined

zmanion commented 5 months ago

The affected array can contain cpes, but these are not associated with a version, version range, or status.

Consider this snippet of https://cveawg.mitre.org/api/cve/CVE-2024-0229:

{
    "vendor": "Red Hat",
    "product": "Red Hat Enterprise Linux 7",
    "collectionURL": "https://access.redhat.com/downloads/content/package-browser/",
    "packageName": "tigervnc",
    "defaultStatus": "affected",
    "versions": [
        {
            "version": "0:1.8.0-31.el7_9",
            "lessThan": "*",
            "versionType": "rpm",
            "status": "unaffected"
        }
    ],
    "cpes": [
        "cpe:/o:redhat:enterprise_linux:7::workstation",
        "cpe:/o:redhat:enterprise_linux:7::client",
        "cpe:/o:redhat:enterprise_linux:7::computenode",
        "cpe:/o:redhat:enterprise_linux:7::server"
    ]
},

If I'm interpreting this correctly, all tigervnc versions are affected, except those with version from 0:1.8.0-31.el7_9 to infinity ('*') are unaffected. IOW, fixed in 0:1.8.0-31.el7_9.

So what do the cpes mean? Are they covered by defaultStatus": "affected"? The cpes seem to be detached from versions and version ranges.

This example is from Red Hat, who are very deliberate and careful in how they produce CVE JSON. My concern is not necessarily with Red Hat's choices on how to convey vulnerability status, but with the semantics of cpes in the CVE Record format. Associating a CPE identifier with versions seems to make better sense.

zmanion commented 5 months ago

I'm not suggesting that the CVE Record Format change to require or depend on CPE, but here is an NVD JSON example (https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2021-40690) where the CPE is clearly bound to status.

[
  {
    "operator": "OR",
    "negate": false,
    "cpeMatch": [
      {
        "vulnerable": true,
        "criteria": "cpe:2.3:a:apache:santuario_xml_security_for_java:*:*:*:*:*:*:*:*",
        "versionEndExcluding": "2.1.7",
        "matchCriteriaId": "AB706AA4-B7E9-4319-A2C0-65B7186507DB"
      },
      {
        "vulnerable": true,
        "criteria": "cpe:2.3:a:apache:santuario_xml_security_for_java:*:*:*:*:*:*:*:*",
        "versionStartIncluding": "2.2.0",
        "versionEndExcluding": "2.2.3",
        "matchCriteriaId": "20BABD5C-5813-48B8-BAC9-0F36F381F12A"
      }
    ]
  }
]

zmanion commented 5 months ago

A Microsoft CVE JSON example, good affected array element, good list of CPE IDs, just hangin' out in the wind. Is the software identified by the CPE IDs affected, unaffected, or unknown? Part of a range?

{
  "vendor": "Microsoft",
  "product": "Windows 10 Version 1809",
  "cpes": [
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:x86:*",
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:x64:*",
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:arm64:*"
  ],
  "platforms": [
    "32-bit Systems",
    "x64-based Systems",
    "ARM64-based Systems"
  ],
  "versions": [
    {
      "version": "10.0.0",
      "lessThan": "10.0.17763.5820",
      "versionType": "custom",
      "status": "affected"
    }
  ]
}

Edit: From https://cveawg.mitre.org/api/cve/CVE-2024-30040

zmanion commented 5 months ago

https://cveproject.github.io/cve-schema/schema/docs/#oneOf_i0_containers_cna_affected_items_cpes

Affected products defined by CPE. This is an array of CPE values (vulnerable and not), we use an array so that we can make multiple statements about the same version and they are separate (if we used a JSON object we'd essentially be keying on the CPE name and they would have to overlap). Also, this allows things like cveDataVersion or cveDescription to be applied directly to the product entry. This also allows more complex statements such as "Product X between versions 10.2 and 10.8" to be put in a machine-readable format. As well since multiple statements can be used multiple branches of the same product can be defined here.

ccoffin commented 5 months ago

A Microsoft CVE JSON example, good affected array element, good list of CPE IDs, just hangin' out in the wind. Is the software identified by the CPE IDs affected, unaffected, or unknown? Part of a range?

{
  "vendor": "Microsoft",
  "product": "Windows 10 Version 1809",
  "cpes": [
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:x86:*",
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:x64:*",
    "cpe:2.3:o:microsoft:windows_10_1809:10.0.17763.5820:*:*:*:*:*:arm64:*"
  ],
  "platforms": [
    "32-bit Systems",
    "x64-based Systems",
    "ARM64-based Systems"
  ],
  "versions": [
    {
      "version": "10.0.0",
      "lessThan": "10.0.17763.5820",
      "versionType": "custom",
      "status": "affected"
    }
  ]
}

https://github.com/CVEProject/cve-schema/blob/30f59c7de92fbc77bddade302601cb500c66f718/schema/CVE_Record_Format.json#L269-L272

https://github.com/CVEProject/cve-schema/blob/30f59c7de92fbc77bddade302601cb500c66f718/schema/CVE_Record_Format.json#L87-L91

Note that if defaultStatus is not specified, it's unknown. IOW, we don't know one way or the other.

My thinking is that in this example, the versions block is specified and lists a status of affected. Seems appropriate that this would translate to the defined CPEs and they also display the same affected versions in the CPEs. I could see how this would get much harder to interpret if multiple version blocks were included. In this case though, it's likely that they intended defaultStatus: unaffected.

ccoffin commented 5 months ago

I created a related issue against the cve-schema repo today.

https://github.com/CVEProject/cve-schema/issues/321

Basically ADPs should be allowed to create a cpes array on it's own without having to define potentially conflicting or inaccurate product and version info. I hadn't considered the Red Hat example where it appears that the CPE info is complimentary to the product and version info.

zmanion commented 5 months ago

One option (which could be specific to an ADP) could be to require that any CVE version, lessThan, or lessThanOrEqual map to the version field in a cpes entry.

zmanion commented 5 months ago

Another option: CVE JSON could implement the CPE Applicability Language, or a subset of it, in order to use CPE to logically define status, versions, and version ranges.

andrewpollock commented 5 months ago

My knowledge about CPEs is largely informed by https://en.wikipedia.org/wiki/Common_Platform_Enumeration because I find it easier to consume than anything else...

Separating vulnerable version identification from vulnerable product identification for a moment, in the light of CPEs (I believe) not yet being widely present on CVE List CVE records, I'd been wondering if the vendor and product components of a CPE string could be derived from the vendor and product fields in the affected object?

That would require tightening up the definition of what is expected to appear in vendor and product. My thought here was to say that it SHOULD correspond with what is defined in the NVD's CPE Dictionary when it exists, and where possible, vendor CNAs SHOULD correspond with the NVD to obtain a CPE Dictionary entry as part of their product's lifecycle (i.e. before a product's first CVE is required).

It is my understanding that while there is a perception that there's a bootstrapping/chicken-and-egg problem with CPEs (until a product has its first CVE, it doesn't have a defined CPE) this is something of a myth, and it's possible to proactively request CPEs (in bulk even) from the NVD. This would mean that in line with the Secretariat's request for (vendor) CNAs to start adding CPEs to their CVEs, they should also be requested to obtain CPE Dictionary entries for products they own that have not yet had CVEs reported for.

/cc @mrmegazone

andrewpollock commented 5 months ago

One option (which could be specific to an ADP) could be to require that any CVE version, lessThan, or lessThanOrEqual map to the version field in a cpes entry.

Another option: CVE JSON could implement the CPE Applicability Language, or a subset of it, in order to use CPE to logically define status, versions, and version ranges.

@zmanion I'd discourage premature solutioning without having the actual requirements nailing down first.

The affected object of a CVE 5.x record already permits a veritable cornucopia of methods to programmatically express vulnerability information, and it's being poorly used today to that end. I'm not convinced that adding even more options to that menu of choices is going to achieve anything useful for downstream consumers of the data.

zmanion commented 5 months ago

Adding another example from https://cveawg.mitre.org/api/cve/CVE-2024-0229:

{
  "vendor": "Red Hat",
  "product": "OpenShift Service Mesh 2.3.x",
  "collectionURL": "https://access.redhat.com/downloads/content/package-browser/",
  "packageName": "(as-yet-unknown)",
  "defaultStatus": "affected",
  "cpes": [
    "cpe:/a:redhat:service_mesh:2.3"
  ]
}

mprpic commented 5 months ago

@zmanion Red Hat's use of CPEs is kind of unique (as you've discovered here). We still use CPE 2.2, which is at this point a very old standard. We've always used CPEs as a way to identify products (think RHEL 9.2, Windows 10.xyz). A product is by nature a loose concept that gives a grouping of components an arbitrary version number just so it's easier to refer to one thing rather than saying "I'm running kernel x.y.z, gnome a.b.c, emacs m.n.o, etc." We still need to be able to attach components to products somehow, so our affected entries always contain both the identifier of the product (one or more CPEs) and the individual components themselves, identified using the packageName and versions fields.

CPEs also then map to individual repositories from where (mostly RPM) content can be fetched using a mapping file available here: https://access.redhat.com/security/data/meta/v1/repository-to-cpe.json. This help scanning vendors to identify where a certain package on a system that is being scanned came from and they are able to link it to a specific product using all this data.

Since we use CPEs as unique identifiers of products, they could just as well be UUIDs; that's the same reason why we haven't really pushed hard to update to CPE 2.3. CPEs are just much nicer to read than serial / SKU numbers or UUIDs. The textual descriptions in the product field are also not a good candidate for product identifiers since they can change over time across different versions while their CPEs should remain the same (apart from the changed version number obviously).

The schema isn't very prescriptive about the use of CPEs so our interpretation is essentially:

product -> cpes
packageName -> versions.* but I understand that for other CNAs cpes is just an alias for the combination of packageName+version.

I'll also note that our CSAF VEX files have this defined in a way that is a bit more intuitive. The product tree contains nodes for the products themselves, e.g.:

{
  "category": "product_name",
  "name": "Red Hat Enterprise Linux AppStream EUS (v.9.2)",
  "product": {
    "name": "Red Hat Enterprise Linux AppStream EUS (v.9.2)",
    "product_id": "AppStream-9.2.0.Z.EUS",
    "product_identification_helper": {
      "cpe": "cpe:/a:redhat:rhel_eus:9.2::appstream"
    }
  }
}

and then nodes that identify an individual component, e.g.:

{
  "category": "product_version",
  "name": "tigervnc-server-0:1.12.0-14.el9_2.5.aarch64",
  "product": {
    "name": "tigervnc-server-0:1.12.0-14.el9_2.5.aarch64",
    "product_id": "tigervnc-server-0:1.12.0-14.el9_2.5.aarch64",
    "product_identification_helper": {
      "purl": "pkg:rpm/redhat/tigervnc-server@1.12.0-14.el9_2.5?arch=aarch64"
    }
  }
}

These two are then linked with a relationship node:

{
  "category": "default_component_of",
  "full_product_name": {
    "name": "tigervnc-server-0:1.12.0-14.el9_2.5.aarch64 as a component of Red Hat Enterprise Linux AppStream EUS (v.9.2)",
    "product_id": "AppStream-9.2.0.Z.EUS:tigervnc-server-0:1.12.0-14.el9_2.5.aarch64"
  },
  "product_reference": "tigervnc-server-0:1.12.0-14.el9_2.5.aarch64",
  "relates_to_product_reference": "AppStream-9.2.0.Z.EUS"
}

Here too we use CPEs as just an identifier of a particular product, and purls to identify a given component (we're still exploring what the best way to represent component version ranges is in this format, but we want to add it at some point).

The CVE JSON schema could potentially borrow from this approach and move the packageName field into versions so that the top-level fields (products, cpes) are reserved for defining products, but also allow cpes to exist in the version objects when identifying components themselves.

We're open to feedback on any of the above of course, and happy to discuss the usage of CPEs across the entire data set.

zmanion commented 5 months ago

@mprpic thanks for the explanation. To be clear, and after some discussion on a couple WG meetings, my concern is with the CVE schema and not how Red Hat implements cpes and affected elements. I think Red Hat and other CNAs should have some flexibility, but also the the CVE format itself needs tighter/more explicit relationships between CPEs and status (and probably versions and ranges).

Today, without any schema changes, I think there may be a way (or ways) to un- or less ambiguously connectcpes to status. I'm going to look into that and try to have some options in time for the June 20 QWG meeting.

andrewpollock commented 5 months ago

CPEs lack the ability to express a version range.

This is why CVE 5.x has the richer syntax in the affected array, and why the NVD's schema has the morally equivalent configurations array.

The SBOM Forum's paper touches on this, in part.

I don't think doubling down on CPEs is a good strategy to pursue.

If we focus on requirements, and drill down into vulnerability scanning in particular: a scanner needs a way to take the thing to be scanned (let's say it's a "product", made by a "vendor", at a particular "version") and determine how many CVEs "match" this.

I think historically, scanners have turned that "product/vendor/version" information into a CPE string and then tried to match it exactly, which creates the need for affected version enumeration to be performed somewhere.

zmanion commented 5 months ago

CPE can express ranges, although it seems to have been added after the original specification(s) were published. Maybe it's more correct to say that NVD JSON can express ranges using CPE.

https://nvd.nist.gov/General/News/CPE-Range-Notification https://nvd.nist.gov/general/News/CPE-Match-Feed-1-0-Release https://csrc.nist.gov/schema/cpematch/feed/1.0/nvd_cpematch_feed_json_1.0.schema

I'd prefer a model that is agnostic to the software identifier, so that CPE could be a choice, but so could other identifiers. Not sure if this is even possible given the variety of version/range semantics.

andrewpollock commented 5 months ago

CPE can express ranges, although it seems to have been added after the original specification(s) were published. Maybe it's more correct to say that NVD JSON can express ranges using CPE.

We're talking about subtly different things. I'm referring to the cpe23Uri string in the context of the links you've cited, and more generally, what's described at https://en.wikipedia.org/wiki/Common_Platform_Enumeration as a CPE.

https://nvd.nist.gov/General/News/CPE-Range-Notification

I fail to understand how examples 1 and 2 here are defining ranges. The third example is the cpeMatch syntax I'm referring to in the NVD's configurations array. They seem to call them "applicability statements" in informal conversations I've had with NVD staff.

https://nvd.nist.gov/general/News/CPE-Match-Feed-1-0-Release

I believe you can achieve the moral equivalent of this with CVE 5.x affected today?

https://csrc.nist.gov/schema/cpematch/feed/1.0/nvd_cpematch_feed_json_1.0.schema I'd prefer a model that is agnostic to the software identifier, so that CPE could be a choice, but so could other identifiers. Not sure if this is even possible given the variety of version/range semantics.

I agree that if we settled on using the CPE Dictionary to arbitrate valid values for vendor and product, that would help to a degree. It doesn't entirely solve the bootstrapping problem of new/existing products being defined before their maiden CVE record... Although I feel that could be addressed by strongly encouraging product vendors to pre-seed the CPE Dictionary as part of their product lifecycle. I'm dubious as to how effective this approach would be, though.

ElectricNroff commented 5 months ago

On the https://nvd.nist.gov/General/News/CPE-Range-Notification page, Example 1 and Example 2 are examples of "On the vulnerability detail pages, ranges will be displayed to the user" mentioned in the first paragraph. Specifically, English words and phrases such as "Up to (including)" are displayed as part of the HTML on pages such as https://nvd.nist.gov/vuln/detail/CVE-2017-5528 (this is not identical to Example 1). English words and phrases such as "Up to (including)" are not an example of the change to the JSON content.

Example 3 is an example of the change to the JSON content, in which the property versionEndIncluding is used.

There are (at least) two different NVD JSON schemas, with different purposes, that support ranges: https://csrc.nist.gov/schema/nvd/feed/1.1/nvd_cve_feed_json_1.1.schema and https://csrc.nist.gov/schema/cpematch/feed/1.0/nvd_cpematch_feed_json_1.0.schema - they both have these properties related to ranges: versionStartExcluding versionStartIncluding versionEndExcluding versionEndIncluding

NVD JSON data can be used in use cases where explicitly mentioning a version in a cpe23Uri field implies higher confidence that that is an affected version (relative to the version number falling inside of a range).

andrewpollock commented 5 months ago

On the https://nvd.nist.gov/General/News/CPE-Range-Notification page, Example 1 and Example 2 are examples of "On the vulnerability detail pages, ranges will be displayed to the user" mentioned in the first paragraph. Specifically, English words and phrases such as "Up to (including)" are displayed as part of the HTML on pages such as https://nvd.nist.gov/vuln/detail/CVE-2017-5528 (this is not identical to Example 1). English words and phrases such as "Up to (including)" are not an example of the change to the JSON content.

Thank you! That little bit of human-targeted prose in the first two examples was completely lost on me 🤦

There are (at least) two different NVD JSON schemas, with different purposes, that support ranges: https://csrc.nist.gov/schema/nvd/feed/1.1/nvd_cve_feed_json_1.1.schema and https://csrc.nist.gov/schema/cpematch/feed/1.0/nvd_cpematch_feed_json_1.0.schema - they both have these properties related to ranges: versionStartExcluding versionStartIncluding versionEndExcluding versionEndIncluding

Yes, the point I was trying to make earlier was that there's equivalent functionality today with CVE 5.x:

| NVD                   | CVE 5           | Description                                                                                                                                               |  |
| --------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |  |
| versionStartExcluding | N/A             | N/A                                                                                                                                                       |  |
| versionStartIncluding | version         | The single version being described, or the version at the start of the range. By convention, typically 0 denotes the earliest possible version.           |  |
| versionEndExcluding   | lessThan        | The non-inclusive upper limit of the range. This is the least version NOT in the range.                                                                   |  |
| versionEndIncluding   | lessThanOrEqual | The inclusive upper limit of the range. This is the greatest version contained in the range. Only one of lessThan and lessThanOrEqual should be specified |  |

zmanion commented 5 months ago

An option in the current schema is to use CPE inside of affected[].versions[]. See CVE-2023-45197 which we just published today:

{
  "cpes": [
    "cpe:2.3:a:adminerevo:adminerevo:4.8.2:*:*:*:*:*:*:*",
    "cpe:2.3:a:adminerevo:adminerevo:4.8.3:*:*:*:*:*:*:*"
  ],
  "defaultStatus": "unknown",
  "product": "AdminerEvo",
  "programFiles": [
    "plugins/file-upload.php"
  ],
  "repo": "https://github.com/adminerevo/adminerevo",
  "vendor": "AdminerEvo",
  "versions": [
    {
      "lessThan": "4.8.3",
      "status": "affected",
      "version": "4.8.2",
      "versionType": "custom"
    },
    {
      "lessThan": "cpe:2.3:a:adminerevo:adminerevo:4.8.3:*:*:*:*:*:*:*",
      "status": "affected",
      "version": "cpe:2.3:a:adminerevo:adminerevo:0:*:*:*:*:*:*:*",
      "versionType": "cpe"
    }
  ]
}

This still leaves affected[].cpes insufficiently defined, but an option could be to provide guidance (preferably validation and enforcement if possible) to use CPE inside affected[].versions[].

MrMegaZone commented 5 months ago

I'm not in favor of including them in the affected versions like this as sites and tools currently render that content for humans - and CPE is very anti-human, IMHO. A lot of consumers, including the CVE.org site, would have to deal with this if we encourage CNAs to use CPEs this way. It would really make for ugly rendering on sites and tools, or the site/tools would need to change to recognize CPEs and not render them as they do other version types.

As on the QWG call yesterday, if we don't want to spend too many resources on it, I'm strongly in favor of adopting the existing NVD CPE Match syntax into our schema: https://nvd.nist.gov/general/News/CPE-Match-Feed-1-0-Release

CPE Match Range Example: CVE Record

"cpe_match" : [ {
     "vulnerable" : true,
     "cpe23Uri" : "cpe:2.3:a:oracle:mysql:*:*:*:*:*:*:*:*",
     "versionStartIncluding" : "5.5.0",
     "versionEndIncluding" : "5.5.43"
     }
]

With versionStart[Including|Excluding] and versionEnd[Including|Excluding], as shown by @andrewpollock above, this maps to the existing version syntax pretty directly. As offered on the call my Matt, the advantage to this is that basically every tool today that is written to consume CPE uses this syntax, as they're pulling it from NVD now. If we adopt this it will be simpler for them to use our feed as a source.

This would also make it easier for NVD to be an ADP as they'd be able to provide their data as-is - and if NVD is going to stick around I'd really prefer to see them become an ADP and not a splinter feed as they are now.

I believe this would cover the use cases discussed, including Microsoft and Red Hat as above. You can specific all affected CPEs (vulnerable true and just list CPEs), all unaffected CPEs (vulnerable fales and list CPEs), or a range (vulnerable true or false, CPE, and versionStart and/or versionEnd).

I think that covers all of the bases and doesn't impact existing tools which work with or render the versions from the JSON.

zmanion commented 4 months ago

Taking @andrewpollock's point about premature solutioning, IMO this issue is meant to be a partial improvement but will not be able to address fundamental issues with CPE or global-scale software identification or changes to how CVE conveys status and "products."

There's some desire for CPE. CVE supports CPE, but badly. I suggest this issue is limited in scope to fixing CVE support for CPE, and I'm in favor of implementing CPE applicability/match, along the lines of importing existing formats/schemas when possible (similar to CVSS).

CPE applicability/match could be implemented in CVE as one of the ways to populate an affected element, although this might raise the same display issues @MrMegaZone mentions above.

Another choice would be a new optional element withinaffected, possibly replacing the current cpes element. This would to some extent duplicate information under affected, with associated risks of inconsistency.

MrMegaZone commented 4 months ago

I think we need to deprecate the existing cpes element and leave it in place for compatibility with existing records and tooling.

We would add a new element - could be called 'cpe-match' or whatever we want - which would use the existing NVD format, as discussed above, translated into JSON.

I don't think it would cause any issues with display as existing tools would not render the new element, and any tools that are updated to do something with the element would be able to implement correct support. But I wouldn't expect this field to be exposed to humans as it is really meant for machine readability.

andrewpollock commented 4 months ago

There's some desire for CPE

Citation needed. I posit that desire is based very much off of that being how things have been done to date. We have an opportunity to evolve past this if it's no longer fit for purpose.

Exploring generating a CPE 2.3 string from other CVE affected field data is the sensible approach IMO.

MrMegaZone commented 4 months ago

As a CNA I want to be able to provide CPE in a meaningful way and not rely on others to calculate it for us.

We're being pushed to include CPE to support it for SBOM, VEX, etc

Either we fix the schema or CNAs will start using x_ elements to do it. I've considered it.

andrewpollock commented 4 months ago

Any redundant data is an opportunity to get it incorrect/inconsistent.

CPEs either need to be made fit for purpose or they need to be replaced with something that is.

There's plenty of arguments that they're not fit for purpose.

andrewpollock commented 4 months ago

As a CNA I want to be able to provide CPE in a meaningful way and not rely on others to calculate it for us.

Can you elaborate on what "meaningful" means here. And to whom?

andrewpollock commented 3 months ago

Circling back to this today, with fresh eyes and thinking, and after a brief chat with @oliverchang...

https://cveproject.github.io/cve-schema/schema/docs/#oneOf_i0_containers_cna_affected_items_cpes defines the semantics as (emphasis added):

Affected products defined by CPE. This is an array of CPE values (vulnerable and not), we use an array so that we can make multiple statements about the same version and they are separate (if we used a JSON object we'd essentially be keying on the CPE name and they would have to overlap). Also, this allows things like cveDataVersion or cveDescription to be applied directly to the product entry. This also allows more complex statements such as "Product X between versions 10.2 and 10.8" to be put in a machine-readable format. As well since multiple statements can be used multiple branches of the same product can be defined here.

So, from https://github.com/CVEProject/quality-workgroup/issues/12#issuecomment-2141148430 regarding CVE-2024-30040:

Is the software identified by the CPE IDs affected, unaffected, or unknown? Part of a range?

It's part of a range, the range defined in the versions array as being 10.0.0..10.0.17763.5820

So my interpretation of the intent here is that any number of CPE strings used in the cpes array are used in conjunction with the version ranges specified in the versions array and as an alternative/adjunct to specifying vendor and product.

oliverchang commented 3 months ago

Chiming in here as the OSV schema maintainer.

I also think we should avoid just pulling in the NVD CPE Match syntax into the CVE schema. It would be very confusing given that CVE 5 already enables rich versioning encoding in the existing schema (https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md).

If I'm following/summarising this thread correctly, it seems like there's a few main issues?

1. How `cpes[]` interact with `versions[]`

As pointed out in the issue text, it's unclear how the existing cpes[] array interacts with versions[].

This is more complicated by the fact that CPEs can express encode products (with a wildcard "*" version), or a product + version as @mprpic points out.

If cpes[] can contain CPEs with concrete versions, they're clearly separate from versions[]. This also leaves no ability to encode version ranges for a CPE with a wildcard version.

To address this, we could tighten the definition of the existing cpes[] to contain only CPEs with wildcard versions. This way, cpes[] can work together with the existing versions semantics as just an additional way to identify a piece of affected software.

With this, I'm not sure if there's still a need for a use case to support CPEs with concrete versions, because we already have everything we need with cpes[] + versions[].

Putting this together, the CVE-2023-45197 example would contain:

{
  "product": "AdminerEvo", 
  "repo": "https://github.com/adminerevo/adminerevo",
   "vendor": "AdminerEvo",
  "cpes": [
    "cpe:2.3:a:adminerevo:adminerevo:*:*:*:*:*:*:*:*",
  ],
  "versions": [
    {
      "lessThan": "4.8.3",
      "status": "affected",
      "version": "4.8.2",
      "versionType": "custom"
    }
  ]
}

Which just says that:

The affected product is "AdminerEvo" from vendor "AdminerEvo", which can identified by "cpe:2.3:a:adminerevo:adminerevo::::::::", OR the repo "https://github.com/adminerevo/adminerevo".
Version 4.8.2 of this product is affected.

2. CPEs and representing product / component trees?

The original Red Hat example uses cpes[] in another different way. It encodes the specific product ("Red Hat Linux"), while the versions[] is referring to a specific package/component inside it ("tigervnc").

Again this seems to be because of under-specification. We could tighten a few definitions here, and scope cpes[] to the ones that make sense for the versions[] ranges. In the case of https://cveawg.mitre.org/api/cve/CVE-2024-0229, this means that cpes[] would be for the specific tigervnc component rather than the Red Hat product.

If there is a need to encode product / component relationships in CVE, then that's probably a separate discussion beyond this one about CPEs.

@mprpic WDYT?

3. Support existing tooling that works with the CPE match syntax

The concern here is that lot of existing tooling relies on the NVD CPE match syntax, which would not work easily with CVE 5 if it had its own version range syntax for CPEs.

To address this, I believe we can bridge existing tooling with CVE 5 syntax very easily by providing a tool/script that can automatically convert CVE 5 versions to NVD match syntax. This should be easy, given the compatibility that @andrewpollock pointed out in https://github.com/CVEProject/quality-workgroup/issues/12#issuecomment-2159648898. It should be very possible to convert every CVE 5 version to a NVD match.

Would this work? All of this creates a bit more work than just pulling in the NVD CPE Match syntax into the CVE schema, but it would be worthwhile imo to improve the status quo and avoid complicating tooling in the long run with multiple ways to encode affected versions.

Chris-Turner-NIST commented 3 months ago

Taking a moment to read through everything...

The first thing I think should be stated here is that there is a difference between CPE Names (typically referred to as CPEs by most) and CPE Match Criteria (the things included in Applicability statements). Applicability Statements are the bridge that connect CPE Names in the CPE Dictionary and CVE Records.

To @oliverchang 's point, the current affected section covers MOST of the bases if the goal is to provide CPE Names. However, if the goal is to provide a more complex representation of applicability using CPE Match Criteria in an Applicability Statement, that can be done by leveraging the existing NVD schema.

Fun Definitions:

CPE Name - The enumerations of known hardware, applications or operating systems that exist within the Official CPE Dictionary for products of interest. CPE Names must have the Part, Vendor, Product and Version components populated at a minimum to be valid.

Within the NVD dataset we provide a configurations section for each CVE record. CVE Record configurations contain CPE Match Criteria.

CPE Match Criteria - These are super sets of CPE Names. Unlike CPE Names, Match Criteria do not require certain components be populated to be valid and come in two representations: • CPE Match Strings • CPE Match String Ranges

CPE Match String - These look very similar to a CPE Name, but have less restrictions on which components must be populated and are intended to reference one or more CPE Names within the CPE Dictionary.

CPE Match String Range - A quicker and more natural way to express a grouping of CPE Name versions. While similar to the CPE Match String, these contain a base string combined with boundaries for the version component. The base string must not contain any values in the version or update components, but may contain values in any other component.

Simple example using parts of CVE-2023-25727:

Applicability Statement containing a CPE Match String and a CPE Match String Range

       "configurations": [
          {
            "nodes": [
              {
                "operator": "OR",
                "negate": false,
                "cpeMatch": [
                  {
                    "vulnerable": true,
                    "criteria": "cpe:2.3:a:phpmyadmin:phpmyadmin:4.0.0:*:*:*:*:*:*:*",
                    "matchCriteriaId": "BCD4C26A-0823-4EAA-8052-6D6A269308E5"
                  },
                  {
                    "vulnerable": true,
                    "criteria": "cpe:2.3:a:phpmyadmin:phpmyadmin:*:*:*:*:*:*:*:*",
                    "versionStartIncluding": "5.0.0",
                    "versionEndExcluding": "5.2.1",
                    "matchCriteriaId": "BCD4C26A-0823-4EAA-8052-6D6A269308E4"
                  }
                ]
              }
            ]
          }
        ]

The same represented in the affected section CVE 5.X, but using the cpes array

{
  "product": "phpmyadmin", 
  "vendor": "phpmyadmin",
  "cpes": [
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.0:-:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.0:alpha1:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.0:rc1:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.1:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.2:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.3:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.0.4:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.0:-:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.0:rc1:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.0:rc2:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.1:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.2:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.3:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.1.4:*:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.2.0:-:*:*:*:*:*:*",
    "cpe:2.3:a:phpmyadmin:phpmyadmin:5.2.0:rc1:*:*:*:*:*:*",
  ],
  "versions": [
    {
      "version": "5.0.0",
      "lessThan": "5.2.1",
      "status": "affected",
      "versionType": "semver"
    }
  ]
},
{
  "product": "phpmyadmin", 
  "vendor": "phpmyadmin",
  "cpes": [
    "cpe:2.3:a:phpmyadmin:phpmyadmin:4.0.0:*:*:*:*:*:*:*",
  ],
  "versions": [
    {
      "version": "4.0.0",
      "status": "affected",
      "versionType": "semver"
    }
  ]
},

zmanion commented 2 months ago

In reference to https://github.com/CVEProject/quality-workgroup/issues/12#issuecomment-2257565296, discussed again during today's QWG meeting:

An individual opinion: I like (even prefer) the idea of not bolting on/duplicating data (and schema), so I'm on board with modifications that make it possible to derive CPE from CVE data.

Fact (I think): I don't see how this can work (at least as proposed in https://github.com/CVEProject/quality-workgroup/issues/12#issuecomment-2257565296). I believe the breaking point is that CPE vendor:product would need to reliably map to whatever is in affected[].vendor and .product. Often for multiple items. How does a CNA do this?

Only allow CPE strings with wildcards in cpes ✅
CVE and NVD have equivalent range expression capabilities/operators (https://github.com/CVEProject/quality-workgroup/issues/12#issuecomment-2159648898) ✅
Version details go in affected[].versions ✅
What I think is still missing is the ability to reliably map items in cpes to elements in affected[]. Sure, tolower(AdminerEvo)==adminerevo, but for even silghtly more complicated data, I don't think the mapping will work. ❎

oliverchang commented 2 months ago

Thanks for the feedback @zmanion!

What I think is still missing is the ability to reliably map items in cpes to elements in affected[]. Sure, tolower(AdminerEvo)==adminerevo, but for even silghtly more complicated data, I don't think the mapping will work.

How are people using affected[].vendor and affected[].product today? They seem to be essentially freeform text for human consumption (which in itself seems fine to me).

Is there a specific need to map affected[].vendor and affected[].product to the relevant CPEs in terms of their value? affected[].cpes[] already live on the same level as affected[].vendor and affected[].product in the same affected[] entry, which already implies a mapping that's established by the CNA/ADP.

zmanion commented 2 months ago

How are people using affected[].vendor and affected[].product today? They seem to be essentially freeform text for human consumption (which in itself seems fine to me).

Yes, today vendor product and versions are arbitrary strings and yes it more or less works.

I haven't tried a more complicated Red Hat or Cisco or Microsoft CVE Record yet, but maybe this works if there are some additional constraints or guarantees on the data (to enforce what I called "mapping" above), for instance, within an affected[] entry:

the vendor field (field 4) of all cpes is equivalent to affected[].vendor
the product field (field 5) of all cpes is equivalent to affected[].product
cpes must not have any data past field 5

From one of @andrewpollock's slides:

Don’t mix and match disparate products in the .affected.cpes[] array of a particular .affected[] entry

zmanion commented 2 months ago

affected[].cpes[] already live on the same level as affected[].vendor and affected[].product in the same affected[] entry, which already implies a mapping that's established by the CNA/ADP.

The word "implies" is probably a dangerous assumption. Under the current schema, rules, guidance, and history, a CNA can put anything in vendor, product, and (possibly multiple) cpes.

So I think I've just restated proposed solution 1.3, what I was missing is the explicit constrained mapping of CPE vendor and product fields to affected[].vendor and affected[].product.

I still may personally favor importing the NVD CPE match schema, but am going to look at converting CVE affected to CPE match as previously suggested.

andrewpollock commented 2 months ago

The word "implies" is probably a dangerous assumption. Under the current schema, rules, guidance, and history, a CNA can put anything in vendor, product, and (possibly multiple) cpes.

And just because they can, doesn't mean they should, and certainly doesn't mean a coherent and usable machine-readable (or human-interpretable) CVE record results. The CVE schema should continue to evolve to make it hard for CNAs to do the wrong thing and easy for them to do the right thing (see also slide 6)

zmanion commented 3 weeks ago

Closing this, appreciate all of the discussion and I don't think we have yet solved the bigger issues around better status and global software identification, but the CVE Board voted to implement CPE Match Criteria and Applicability statements (I believe it is both terms) in the CVE Record Format, which addresses this specific issue.

CVEProject / quality-workgroup