cisagov / vulnrichment

A repo to conduct vulnerability enrichment.
Creative Commons Zero v1.0 Universal
406 stars 29 forks source link

Spaces in cpes #8

Closed joshbressers closed 2 months ago

joshbressers commented 2 months ago

While inspecting the data, I ran across some cpe names with spaces in the which the spec says is not allowed. Section 5.2.3 says

The underscore (x5f) MAY be used, and it SHOULD be used in place of whitespace characters (which SHALL NOT be used).

The first example in my list also contains a & which should be escaped according to the spec.

I'm pasting a short sample of what I see

CVE-2024-2118 cpe2.3:a:wordpress:Social Media Share Buttons & Social Sharing Icons:-:*:*:*:*:*:*:*
CVE-2024-3955 cpe2.3:a:PiBrewing:CraftBeerPi 4:4.0.0.58 (commit 563fae9):*:*:*:*:*:*:*
CVE-2024-3742 cpe2.3:a:electrolink:high power dab transmitter:-:*:*:*:*:*:*:*
CVE-2024-3742 cpe2.3:a:electrolink:compact dab transmitter:-:*:*:*:*:*:*:*

I'm happy to suggest corrections any way you like. PRs are fine or filing issues is also OK.

joshbressers commented 2 months ago

A followup to this. I found this regular expression

cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){4}

in this document https://csrc.nist.gov/schema/cpe/2.3/cpe-naming_2.3.xsd

If I run all the cpe names currently in adp section, I get 1327 valid cpes, and 1188 invalid

jwoytek-cisa commented 2 months ago

@joshbressers thank you for identifying this! We're working on clarifying our enrichment process and also making sure entries are validated prior to publication.

It may be a little bit until the existing data is fixed, but we're discussing how to address that now.

jwoytek-cisa commented 2 months ago

@joshbressers we discovered two issues while investigating this report. First, some CPE entries were being written out missing a colon between cpe and the CPE record version number. That accounted for the majority of items that were failing validation. The second issue was incomplete validation of user-entered data, which included the spaces and unescaped characters that you found. We have implemented additional validation of user-entered data and have passed on the details to the analysts so that they can adjust their process accordingly.

We have fixed the entries that were failing validation in the most recent data update. I am going to mark this report closed, but please feel free to re-open if you notice something that we missed. We appreciate your report and hope to continue improving things as we move forward.