Closed joshbressers closed 2 months ago
A followup to this. I found this regular expression
cpe:2\.3:[aho\*\-](:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){5}(:(([a-zA-Z]{2,3}(-([a-zA-Z]{2}|[0-9]{3}))?)|[\*\-]))(:(((\?*|\*?)([a-zA-Z0-9\-\._]|(\\[\\\*\?!"#$$%&'\(\)\+,\/:;<=>@\[\]\^`\{\|}~]))+(\?*|\*?))|[\*\-])){4}
in this document https://csrc.nist.gov/schema/cpe/2.3/cpe-naming_2.3.xsd
If I run all the cpe names currently in adp section, I get 1327 valid cpes, and 1188 invalid
@joshbressers thank you for identifying this! We're working on clarifying our enrichment process and also making sure entries are validated prior to publication.
It may be a little bit until the existing data is fixed, but we're discussing how to address that now.
@joshbressers we discovered two issues while investigating this report. First, some CPE entries were being written out missing a colon between cpe
and the CPE record version number. That accounted for the majority of items that were failing validation. The second issue was incomplete validation of user-entered data, which included the spaces and unescaped characters that you found. We have implemented additional validation of user-entered data and have passed on the details to the analysts so that they can adjust their process accordingly.
We have fixed the entries that were failing validation in the most recent data update. I am going to mark this report closed, but please feel free to re-open if you notice something that we missed. We appreciate your report and hope to continue improving things as we move forward.
While inspecting the data, I ran across some cpe names with spaces in the which the spec says is not allowed. Section 5.2.3 says
The underscore (x5f) MAY be used, and it SHOULD be used in place of whitespace characters (which SHALL NOT be used).
The first example in my list also contains a & which should be escaped according to the spec.
I'm pasting a short sample of what I see
I'm happy to suggest corrections any way you like. PRs are fine or filing issues is also OK.