expressjs / body-parser

Node.js body parsing middleware
MIT License
5.45k stars 727 forks source link

Improved JSON parsing to handle tabs within strings. #520

Closed oukhrib closed 8 months ago

oukhrib commented 8 months ago

This pull request addresses an issue where the parser encountered errors when handling JSON strings containing tabs (actual tab characters). A test case is included to demonstrate the issue:

{"fullname":"John   Doe"}

To ensure wider compatibility, the fix replaces all tabs within strings with two spaces. This approach aligns with common parsing practices and avoids potential errors caused by tabs.

wesleytodd commented 8 months ago

Hi, can you provide more information on the behavior and error you are having? It appears you are trying to change a behavior from JSON.parse which is provided by the language runtime not this library. If that is the case (please tell me if I am wrong) we are not interested in this type of additional behavior.

jonchurch commented 8 months ago

Wow TIL: ECMA 404 v2 section 4

Insignificant whitespace is allowed before or after any token. Whitespace is any sequence of one or more of the following code points: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

We are not able to accept this change because parsing invalid JSON is out of the scope of this package.

Both of these will throw because they are invalid JSON.

JSON.parse('{"fullname":"John   Doe"}')
JSON.parse('{"fullname":"John \t Doe"}')

For completeness, this will not throw, because escaping the tab makes it valid:

JSON.parse('{"fullname":"John \\t Doe"}')

This also will not throw, because control characters outside of values are not considered invalid

JSON.parse('{"fullname": \t"John Doe"}')

This isn't just true for tabs, but generally for non-space whitespace characters inside of a JSON value itself.

These also throw for the same reason:

JSON.parse('{"fullname": "John \nDoe"}')
JSON.parse('{"fullname": "John \rDoe"}')

If you want to craft a valid JSON string from your JS object:

JSON.stringify({"fullname":"John      Doe"})
// '{"fullname":"John\\tDoe"}'