CVEProject / cve-schema

This repository is used for the development of the CVE JSON record format. Releases of the CVE JSON record format will also be published here. This repository is managed by the CVE Quality Working Group.
Creative Commons Zero v1.0 Universal
246 stars 138 forks source link

Prevent descriptions from containing only whitespace #232

Open slubar opened 1 year ago

slubar commented 1 year ago

The description field is a required field currently defined with minLength=1, with no pattern constraint. This allows for valid descriptions comprised only of whitespace, as well as descriptions containing just a single character. Whitespace-only descriptions should be made invalid. A larger minimum character length might also be considered.

ElectricNroff commented 1 year ago

The discussion at the QWG meeting today suggested adding a pattern field here to specify at least one non-whitespace character: https://github.com/CVEProject/cve-schema/blob/20a9e977d9020c12d7dce07eb7ef8de30bd61f64/schema/v5.0/CVE_JSON_5.0_schema.json#L669-L678

This was thought to be better than changing: https://github.com/CVEProject/cve-schema/blob/20a9e977d9020c12d7dce07eb7ef8de30bd61f64/schema/v5.0/CVE_JSON_5.0_schema.json#L735-L741 with something like:

"#/definitions/descriptionWithNonWhitespace"

so that the change would affect only the descriptions part of a CNA or ADP container. In other words, it would be best to change the meaning of #/definitions/description and thereby force a non-whitespace character in all contexts where #/definitions/description is used (e.g., impacts and rejectedReasons).

Among the few people who attended, there was no support for a more general change to the minimum character length. Prohibiting an all-whitespace value helps with these objectives:

Increasing the minimum character length might block some descriptions that have low utility, but providers could simply compose an equally useless description that has more characters.

ElectricNroff commented 1 year ago

Trailing (or possibly leading) whitespace is common here: https://github.com/CVEProject/cve-schema/blob/20a9e977d9020c12d7dce07eb7ef8de30bd61f64/schema/v5.0/CVE_JSON_5.0_schema.json#L669-L679 with more than 4000 CVE Records affected. The common cases are one trailing space, two trailing spaces, one trailing \n character, or two trailing \n characters.

Also,

"^(?:\\S|\\S.*\\S)$"

was potentially unintended. It prevents newline characters in the middle of a string (affecting approximately a thousand CVE Records). This can be fixed by:

"^(?:\\S|\\S[\\s\\S]*\\S)$"

Leading/trailing whitespace is also seen in version fields, with more than 400 CVE Records affected. The common cases are one trailing space or one leading space.

ElectricNroff commented 1 year ago

Leading/trailing whitespace is also seen in names of vendors and names of products, with more than 2500 CVE Records affected. The common cases are one trailing space or one leading space.

One CVE Record was found with trailing versionType whitespace:

CVE-2023-1900

"versionType":"1.0.2303.633 "
david-waltermire commented 1 year ago

@ElectricNroff Would you run similar queries against records produced in the last year? This will give us a better sense of what the current record production behavior is. We want a general sense of the magnitude of occurrence.

ElectricNroff commented 1 year ago

For records published in 2023:

fields that use the "description" definition - 2049 version fields - 166 vendor and product fields - 279

The large difference for vendor and product fields is mostly explained by one large vendor no longer including a space at the end of the product name.