endpoints accept UTF-16 but there are no test cases

ElectricNroff commented 2 years ago

Endpoints such as POST /cve/{id}/cna accept UTF-16 data, and https://cveawg-test.mitre.org/api-docs/#/CVE%20Record/cveCnaCreateSingle doesn't specifically mention whether UTF-16 is unsupported. The endpoint tests for https://github.com/CVEProject/cve-services/tree/b083cfe4633442d8ec377828956c9b173c930718 apparently always send

Content-Type: application/json

and never send

Content-Type: application/json; charset=UTF-16

In other words, UTF-16 input is part of the product's attack surface but there's no assurance that UTF-16 data is validated correctly. It's probably more relevant than other attack variants (e.g., it's plausible that an attack requires a non-default character encoding; it's less plausible that an attack requires a non-default use of HTTP). There were problems in the past (e.g., see the https://github.com/nodejs/node-v0.x-archive/issues/4853 report). It may be important to know that UTF-16 input from a client results in storing the same CVE Record on the server (relative to UTF-8 input), and won't cause data corruption on the server. Also, it may be worthwhile to know whether clients receive usable error messages if their UTF-16 input is incorrect.

There might also be a way to deny any API request that isn't using UTF-8 but https://expressjs.com/en/resources/middleware/body-parser.html doesn't discuss that.

For example, downloading the https://github.com/CVEProject/cve-schema/blob/8b6a261163b98392a72edcb3d087833becc2b91a/schema/v5.0/docs/basic-example.json file, converting it to the cnaContainer format, and sending it to the /cve/{id}/cna endpoint works successfully. It still works successfully if the file is converted to UTF-16 and is sent to the endpoint with charset=UTF-16 in the header, e.g., these are the two files with substantially different sizes and incompatible encodings:

% stat --terse basic-example-utf-16.json | awk '{print $1 " " $2}'
basic-example-utf-16.json 2672
% stat --terse basic-example-utf-8.json | awk '{print $1 " " $2}'
basic-example-utf-8.json 1335
% strings -e l - basic-example-utf-16.json | fgrep product
            "product": "Example Enterprise",
% strings -e s - basic-example-utf-16.json | fgrep product
% strings -e l - basic-example-utf-8.json | fgrep product
% strings -e s - basic-example-utf-8.json | fgrep product
            "product": "Example Enterprise",

If the UTF-8 file is transmitted as UTF-16, there is a plausible error:

{"error":"INVALID_JSON_SYNTAX","message":"Unexpected token M-bM-^AM-; in JSON at position 0"}

Similarly, if the UTF-16 file is transmitted as UTF-8, there is a (different) plausible error:

{"error":"INVALID_JSON_SYNTAX","message":"Unexpected token M-oM-?M-= in JSON at position 0"}

In limited testing, no anomalies in UTF-16 handling were found. For example, the response from the server always has:

Content-Type: application/json; charset=utf-8

even if the client sent UTF-16. Also, the data from the server is apparently always in the UTF-8 format even if it is reporting an incorrect UTF-16 byte sequence.

Ideally, there would be at least one positive test and at least one negative test for POST /cve/{id}/cna with application/json; charset=UTF-16.

jdaigneau5 commented 11 months ago

@ElectricNroff Do you think preventing utf-16 encoding would address this concern?

ElectricNroff commented 11 months ago

maybe in the long term but some organizations may currently be sending UTF-16 in their day-to-day operations

Steps might include checking whether there is any log data about whether UTF-16 has been used, adding logging for UTF-16, asking organizations (such as CNAs) whether they know of any UTF-16 use on their end, asking about important UTF-16 use cases (e.g., a CNA might be using UTF-16 throughout their entire vulnerability-management process), announcing a UTF-16 deprecation schedule, etc.

jdaigneau5 commented 6 months ago

Dev note: Current unit testing framework does not allow for testing UTF-16 file type posting to endpoints. Will investigate further.

CVEProject / cve-services

endpoints accept UTF-16 but there are no test cases #734