Closed ElteHupkes closed 2 years ago
Hey Elte Hupkes!
I have the fix for this but I do it with not equal to backslash or double quote. I believe your suggestion would introduce a bug where json such as { "hello" : "\" } would be allowed!
Looking at the spec: https://datatracker.ietf.org/doc/html/rfc8259#section-7
Plane, the character is represented as a 12-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
I do not check for > 0x10FFFF as this is out of the range of the character (However this may also be a bug - I believe I need to peek ahead to get two bytes due to UTF-16).
Let me make a pull request
Sorry @ElteHupkes you are correct as blackslash and double quotes are already handled above. You only need to check for < 0x20 or <= 0x1F as you say!
I would like to say I can still comment on the specifics, but after two and a half years they seem to have slipped my working memory 😉. Great to see this fixed though!
From the official JSON specification, section 9 (emphasis mine):
JsonReader
checks for invalid control characters usingchar.IsControl()
, and from its documentation:In other words, there is a range of characters which are designated Unicode control characters (and thus return a positive result from
char.IsControl()
) which are not deemed control characters according to the JSON standard. These characters will therefore not be escaped by many JSON serializers, which may cause an invalidJsonParseException
to be thrown when the JSON contains them.Of course this is all very unlikely, and of course therefore it just happened to me while consuming a JSON API and I spent the last few hours debugging the problem (turns out,
\u007f
doesn't quite show up in string output!).EDIT: From the top of my head, as a suggested fix, I think replacing
char.IsControl(c)
withc <= '\u001f'
should do the trick.