Open ElectricNroff opened 2 years ago
@ElectricNroff Do you think preventing utf-16 encoding would address this concern?
maybe in the long term but some organizations may currently be sending UTF-16 in their day-to-day operations
Steps might include checking whether there is any log data about whether UTF-16 has been used, adding logging for UTF-16, asking organizations (such as CNAs) whether they know of any UTF-16 use on their end, asking about important UTF-16 use cases (e.g., a CNA might be using UTF-16 throughout their entire vulnerability-management process), announcing a UTF-16 deprecation schedule, etc.
Dev note: Current unit testing framework does not allow for testing UTF-16 file type posting to endpoints. Will investigate further.
Endpoints such as POST /cve/{id}/cna accept UTF-16 data, and https://cveawg-test.mitre.org/api-docs/#/CVE%20Record/cveCnaCreateSingle doesn't specifically mention whether UTF-16 is unsupported. The endpoint tests for https://github.com/CVEProject/cve-services/tree/b083cfe4633442d8ec377828956c9b173c930718 apparently always send
and never send
In other words, UTF-16 input is part of the product's attack surface but there's no assurance that UTF-16 data is validated correctly. It's probably more relevant than other attack variants (e.g., it's plausible that an attack requires a non-default character encoding; it's less plausible that an attack requires a non-default use of HTTP). There were problems in the past (e.g., see the https://github.com/nodejs/node-v0.x-archive/issues/4853 report). It may be important to know that UTF-16 input from a client results in storing the same CVE Record on the server (relative to UTF-8 input), and won't cause data corruption on the server. Also, it may be worthwhile to know whether clients receive usable error messages if their UTF-16 input is incorrect.
There might also be a way to deny any API request that isn't using UTF-8 but https://expressjs.com/en/resources/middleware/body-parser.html doesn't discuss that.
For example, downloading the https://github.com/CVEProject/cve-schema/blob/8b6a261163b98392a72edcb3d087833becc2b91a/schema/v5.0/docs/basic-example.json file, converting it to the cnaContainer format, and sending it to the /cve/{id}/cna endpoint works successfully. It still works successfully if the file is converted to UTF-16 and is sent to the endpoint with charset=UTF-16 in the header, e.g., these are the two files with substantially different sizes and incompatible encodings:
If the UTF-8 file is transmitted as UTF-16, there is a plausible error:
Similarly, if the UTF-16 file is transmitted as UTF-8, there is a (different) plausible error:
In limited testing, no anomalies in UTF-16 handling were found. For example, the response from the server always has:
even if the client sent UTF-16. Also, the data from the server is apparently always in the UTF-8 format even if it is reporting an incorrect UTF-16 byte sequence.
Ideally, there would be at least one positive test and at least one negative test for POST /cve/{id}/cna with application/json; charset=UTF-16.