dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.56k stars 4.54k forks source link

System.Text.Json parser should optionally tolerate malformed json #32291

Open eiriktsarpalis opened 4 years ago

eiriktsarpalis commented 4 years ago

Forwarding a report on twitter about System.Text.Json failures when subjected to property-based testing.

It seems that the STJ parser is choking on scenaria that the Newtonsoft.Json implementation is perfectly capable of handling. I've transcribed the original report into a standalone project.

cc @Horusiath

GrabYourPitchforks commented 4 years ago

What is the actual test string it's failing on? The JSON deserializer mandates strict well-formedness by default (unlike Newtonsoft.Json), but there are toggles so that the caller can opt in to more relaxed behavior if necessary.

eiriktsarpalis commented 4 years ago

What is the actual test string it's failing on?

For instance "\"\u0018\"", or any string with improperly escaped characters.

there are toggles so that the caller can opt in to more relaxed behavior if necessary.

Could you point me to that config? I'm looking at JsonDocumentOptions and can't find something relevant.

GrabYourPitchforks commented 4 years ago

You found the right doc, but it looks like there's no switch for opting in to malformed string handling. There's a switch malformed array handling and a few other cases. If malformed string handling is an important scenario the team should consider an opt in switch for it. It would not be enabled by default, however.

eiriktsarpalis commented 4 years ago

Being able to handle malformed json is important, since it can often come from sources you do not control. I can attest from prior experience that Newtonsoft is excellent at dealing with this.

GrabYourPitchforks commented 4 years ago

We made a stance that we allow only well-formed payloads by default, both for security and correctness reasons. Newtonsoft is very liberal in what it allows by default (malformed payload processing, cyclic references, invalid UTF-16 generation, comment fudging). When I was doing red team work one of my favorite vectors was using these non-compliant-on-by-default behaviors to exploit applications. :)

When I'm back on a proper desktop I'll modify this item to track adding a switch to relax the string processing behavior.

eiriktsarpalis commented 4 years ago

A few other examples of invalid json that occur too frequently in the wild:

GrabYourPitchforks commented 4 years ago

Looks like you beat me to changing the title! Thanks :)

ghost commented 2 years ago

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.

eiriktsarpalis commented 2 years ago

Adding to https://github.com/dotnet/runtime/issues/32291#issuecomment-586402938 we might also consider exposing a setting for tolerating unescaped control characters in string values.

ClementeGao commented 2 years ago

@eiriktsarpalis @GrabYourPitchforks When there are special characters in the JSON string, how can I ignore them and how can I configure them see issue #69502

p10tyr commented 1 year ago

Hello . Is there any option for Missing quotes in property names - Because apparently that is a normal thing from Python or PHP. I am receiving JSON from a 3rd party like this and I'm stuck now.

I cant use string on API because the content type is JSON so it says failing to convert JSON to string.. yay not happy. I can use object as a last resort because it just tries to convert it to an object and getting "'t' is an invalid start of a property name. Expected a '\"'

Seems like I have to go back to Newtonsoft as I cant find any works arounds. Wasted 2 hours on this issue

am11 commented 8 months ago

Just ran into this limitation while upgrading (~15yo) System.Web.Helpers.Json -based reader code to System.Text.Json.Nodes. I will switch to Newtonsoft.Json instead.

Having laxed parsing option in STJ, which encompasses Newtonsoft && S.W.H.Json behavior, is a definite improvement.

huoyaoyuan commented 2 months ago

Hitting exactly the behaviors mentioned here. I'm extracting a JSON object from some JavaScript and I can confidently determine its start. It contains unquoted property names, trailing commas, and more JavaScript content after the object.

CyrusNajmabadi commented 2 months ago

JavaScript should produce legal Json. How is your js producing the value?

huoyaoyuan commented 2 months ago

The browser engine is just more tolerant about Javascript code.

CyrusNajmabadi commented 2 months ago

@huoyaoyuan that didn't really answer my question. Can you provide a repro where you're producing a json object in JS using a browser engine and it's not generating legal json?

huoyaoyuan commented 2 months ago

@CyrusNajmabadi the code I'm parsing is like this:

Object.defineProperty(object1, "p", {
    value: {
        foo : "bar",
        isEnabled : true,
        content : { "title": "title", "id": 123, "Text": "Text" }
    }
});

Yes it's inconsistent about property name, but browser and console just accepts it.

CyrusNajmabadi commented 2 months ago

That's a JavaScript object, not Json. Json is a subset.

If you serialize that object to Json (using the actual js APIs for that purpose) you'll get Json that can be parsed.

--

To be as clear as possible, Json is not for parsing arbitrary JavaScript. It's for Json objects.

SteveL-MSFT commented 2 months ago

Having options for the JSON document during parsing to be more lenient would be great for interactive users using PowerShell https://github.com/PowerShell/PowerShell/issues/21338