JamesNK / Newtonsoft.Json

Json.NET is a popular high-performance JSON framework for .NET
https://www.newtonsoft.com/json
MIT License
10.71k stars 3.24k forks source link

Support "strict mode" for RFC7159 parsing #646

Open jskeet opened 8 years ago

jskeet commented 8 years ago

Currently Json.NET parses in a more lenient manner than RFC7159. For example:

{
  "foo": Infinity
}

... will be parsed by Json.NET, but isn't valid under RFC7159.

I'm sure that's generally useful, but it makes it harder to build on Json.NET when strict parsing is required. Other features such as date detection can be disabled already, but I don't think there's currently any way of disabling this leniency.

This is only one example - there may well be others that James is aware of, given that he's rather more deeply "in" this world than I am :) (Comments, possibly the quote characters used, etc.)

JamesNK commented 8 years ago

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read(). If QuoteChar is ' then throw an error, if TokenType is a Float and Value is double.Infinity or double.NaN then throw an error, if TokenType is Comment then throw an error, etc.

jskeet commented 8 years ago

Thanks, that's definitely a practical workaround. I think this would be generally useful functionality to be in Json.NET itself at some point, but I'll give that a try for now. (I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...)

CyrusNajmabadi commented 6 years ago

(I'll try to identify the ways in which Json.NET is more lenient than the RFC, and note what I find here for future readers...

@jskeet I'm adding JSON editor support for Roslyn over here: https://github.com/dotnet/roslyn/pull/24110

I've been using json.net to base how parsing will work (as that's likely the json parser that most of them wil be using in practice). However, i think it would be nice to allow a strict mode as well. There are very obvious ways that json.net deviates from standard JSON. But i'm concerned i may miss other cases. Do you happen to have the list that identifies all the ways Json.Net is more lenient? That would be very helpful for ensuring i don't miss anything.

JamesNK commented 6 years ago

Off the top of my head:

Probably the most commonly used "invalid" JSON would be properties with single quotes or no quotes, single and multi-line comments, new lines in strings, and trailing commas.

JamesNK commented 6 years ago

https://github.com/JamesNK/Newtonsoft.Json/tree/master/Src/Newtonsoft.Json.Tests/JsonTextReaderTests

CyrusNajmabadi commented 6 years ago

@JamesNK Thanks for the list. Fortunately, reading the actual json spec was simple enough that i was able to find all that as well. Other things i can think of are leniency around numbers and whitespace. Also, there are stricter rules around what is allowed in a string and what needs to be escaped.

CyrusNajmabadi commented 6 years ago

Interestingly enough, i haven't found an actual .net json parser that tries to match the spec precisely. I've found deviations in several (including MS's own DataContractJsonSerializer and JavaScriptSerializer).

@JamesNK Note: i'm a big fan of json.net and how it's addressed this space (including accepting a superset of json). Do you think there's a place in json.net to have an option for more strict processing of json? If so, i'd be happy to contribute. But i don't know if such an option fits your vision for the project.

Thanks!

CyrusNajmabadi commented 6 years ago

Hexadecimal numbers (I think...)

Yup. Not allowed in json. Octal isn't allowed either. Leading zeros are also verboten. Fun stuff.

JamesNK commented 6 years ago

Do you think there's a place in json.net to have an option for more strict processing of json?

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

rlaphoenix commented 6 years ago

+1 This is needed. The reason case on my end is that I use a "Repository" for json files that the public can upload and download. PHP uses the proper RFC implementation so if the user makes a JSON that works fine with the C# tester application, they think there wont be any issues, but in fact, there will be when its trying to be parsed and used on my PHP upload script.

So a strict mode here, would be insanely useful to tell the user on my C# end that its not proper syntax, and they can google or lint check to figure out there issue. Otherwise they just say my PHP script is broken plz fix m8 bla bla annoying shit when in fact, its there JSON thats broken.

JonHanna commented 6 years ago

I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options.

I think there's a different dimension of change between "valid by the standard" and "valid by the standard except for in the following cases…" that makes this request different to a request like that.

What I could see happening though is a request for validity according to some RFC 8259bis that replaces RFC 8259 (indeed, the relevant RFC was 7159 when this issue was opened, though I think the only change in validity was that UTF-8 is required for interchange which is at a different level than parsing anyway) and/or a later version of ECMA 404 / ISO/IEC 21778 (presumably the three standards will continue to track each other, but the timings of versions could differ). It might be wise future-proofing to define validity through an enum that could have later not-fully-compatible standard versions added to it, but not to offer such validation for anything other than a published standard.

But a validating reader could also be built as a separate package.

rlaphoenix commented 6 years ago

Just add a bool flag, honestly if people ask for more options, we can discuss if it will be required or not.

TylerBrinkley commented 6 years ago

You could add a flag enum to represent the allowable deviations. If you make it int based it would support up to 32 different options but 64 different options if you make it long based.

CyrusNajmabadi commented 6 years ago

Personally, i'd have no problem just adding a strict mode without any flexibility. If you want any form of looseness, you don't ask for strict. If you want strict, you get it. But i'm not the project owner :)

sixlettervariables commented 6 years ago

@cyrusnajmabadi Manatee.Json only supports strict mode (I believe).

elgonzo commented 6 years ago

I'm not sure. I get the feeling that if I add a strict mode as just a bool flag on JsonTextReader then people will start asking for more options. "I want to error on comments but still accept trailing commas", etc. I don't want to add lots of fields to JsonTextReader to configure how strict it should be.

You don't need to add lots of fields just because someone is asking. ;)

Introducing a option for enabling strict mode would help many users. It makes sense, as it allows users to read and validate RFC-conform Json data without the hassle of building and maintaining unwieldy JsonTextReader.Read() overrides.

For anyone else wanting an individual blend of a little strictness here and a little looseness there, just refer them to your comment from Sept. 2015:

What you could do is inherit from JsonTextReader, override Read and check the incoming data after each base.Read().

guylando commented 5 years ago

What about deserializing and then serializing and comparing to original string? This validates that properties are surrounded by ".." for example. Useful to validate if json is valid if it is sent to client side to be parsed by javascript.

JsonConvert.SerializeObject(JsonConvert.DeserializeObject(str)) == str

jskeet commented 5 years ago

@guylando: That would be a very inefficient way of validating, IMO.

asthomas commented 3 years ago

For what it's worth, I am running into exactly this issue. I need to determine whether a document is valid JSON. A strict mode would be perfect for that.

The whole purpose of a spec is to ensure interoperability. If different implementations deviate from the spec, they are no longer interoperable. In my case interoperability is paramount.

osexpert commented 3 years ago

I think DateParseHandling.None should also be a part of strict mode. Strict mode would make the road from Json.Net to System.Text.Json less bumpy. Edit: STJ follow spec and restrict date format to only support ISO 8601-1:2019 (i think), in case they are more strict than spec. These two will interop a lot as SJT become more popular, it make sense that Json.Net start getting closer to STJ (opt-in).