JamesNK / Newtonsoft.Json

Json.NET is a popular high-performance JSON framework for .NET
https://www.newtonsoft.com/json
MIT License
10.73k stars 3.25k forks source link

JsonConvert.DeserializeObject does not always respect Required.Always on a JSON property if run multi-threaded #2867

Open antispam2002 opened 1 year ago

antispam2002 commented 1 year ago

Source/destination types

public class Payload
{
  [JsonProperty("eventType", Required = Required.Always)]
  public string EventType { get; set; } = null!;
}

Source/destination JSON

{"eventType":""} (Scenario 1)
{"eventType":null} (Scenario 2)

Expected behavior

In the scenario 1 the payload should be de-serialized correctly so that payload.EventType is `` (an empty string). In scenario 2 an exception must be thrown like

Required property 'eventType' expects a value but got null.

Actual behavior

When running the de-serialization in a multi-threaded environment, the exception is not thrown initially.

Steps to reproduce

The following program can be used to reproduce the behavior:

using System.Text;
using Newtonsoft.Json;

var timer1 = new Timer(state =>
  {
    var msg = new StringBuilder();
    msg.Append("EventType `empty`: -> should not throw");
    try
    {
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": """" }"); // This does not throw.
    }
    catch (Exception e)
    {
      msg.Append(e.Message);
    }
    Console.WriteLine(msg.ToString());
  },
  null, TimeSpan.FromMilliseconds(1), TimeSpan.FromMilliseconds(1));
var timer2 = new Timer(state =>
  {
    var msg = new StringBuilder();
    msg.Append("EventType `null` -> must throw: ");
    try
    {
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": null }"); // This must throw.
    }
    catch (Exception e)
    {
      msg.Append(e.Message);
    }
    Console.WriteLine(msg.ToString());
  },
  null, TimeSpan.FromMilliseconds(1), TimeSpan.FromMilliseconds(1));
// sleep in main thread for a while
Thread.Sleep(1000);

public class Payload
{
  [JsonProperty("eventType", Required = Required.Always)]
  public string EventType { get; set; } = null!;
}

Compiled with Newtonsoft.Json 13.0.3 & .Net SDK 7.0.5 the program produces the output similar to the one below:

net7.0/Json.NET.Test.exe
EventType `null` -> must throw:
EventType `null` -> must throw:
EventType `empty`: -> should not throw
EventType `null` -> must throw:
EventType `empty`: -> should not throw
EventType `empty`: -> should not throw
EventType `empty`: -> should not throw
EventType `empty`: -> should not throw
EventType `empty`: -> should not throw
EventType `null` -> must throw:
EventType `empty`: -> should not throw
EventType `empty`: -> should not throw
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
EventType `empty`: -> should not throw
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
EventType `empty`: -> should not throw
EventType `null` -> must throw: Required property 'eventType' expects a value but got null. Path '', line 1, position 21.
...

Several first calls to JsonConvert.DeserializeObject do not throw any exceptions. After a while they start to throw it, however.

antispam2002 commented 1 year ago

Added a repo with the test code at https://github.com/antispam2002/test-newtonsoft-issue.git

elgonzo commented 1 year ago

Nice find!

Seems the issue (or at least one of the issues) is with the JsonObjectContract.HasRequiredOrDefaultValueProperties property getter.

Note how the getter sets the _hasRequiredOrDefaultValueProperties field first to false when it still has the initial value of null.

In your example case, you start a bunch of deserializations pretty much concurrently all using the same object contract for Payload. So, the first couple of deserializations start querying the JsonObjectContract.HasRequiredOrDefaultValueProperties concurrently while the _hasRequiredOrDefaultValueProperties field is still null, and they each start executing the if (_hasRequiredOrDefaultValueProperties == null) branch.

These first couple of deserializations basically immediately set _hasRequiredOrDefaultValueProperties to false. For _hasRequiredOrDefaultValueProperties to be eventually set to true can take some execution time. Meanwhile, with the _hasRequiredOrDefaultValueProperties now being false, the next couple of deserializations see _hasRequiredOrDefaultValueProperties being false and thus won't act as they would be required to. The situation will only stabilize once all of the couple of serializations racing the if branch have passed setting _hasRequiredOrDefaultValueProperties to false and then one of them eventually gets to setting _hasRequiredOrDefaultValueProperties to true. And during that time window, you will see the observed incorrect behavior.

The naive "solution" -- at least with regard to JsonObjectContract.HasRequiredOrDefaultValueProperties thread safety -- is to set _hasRequiredOrDefaultValueProperties only once in the getter in a thread-safe manner, with any needed intermediate values stored in a local.

Also of note here is the fact that _hasRequiredOrDefaultValueProperties is a Nullable<bool> that does not guarantee atomic access. Therefore a simple assignment like \_hasRequiredOrDefaultValueProperties = true is not atomic and not thread-safe either. With an initial value of _hasRequiredOrDefaultValueProperties being null, another thread could potentially see an intermediate state of _hasRequiredOrDefaultValueProperties where its value is false (i.e., _hasRequiredOrDefaultValueProperties.HasValue being true while _hasRequiredOrDefaultValueProperties.Value being false).



For the time being, one possible workaround is to "pre-initialize" the _hasRequiredOrDefaultValueProperties field of the object contract by running a (dummy) warm-up deserialization in a non-concurrent manner beforehand. Something like:

using System.Text;
using Newtonsoft.Json;

//
// Dummy deserialization for initializing the object contract in a non-concurrent manner
//
JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": """" }");

var timer1 = new Timer(state =>
{
    ... the rest of your code

Although, admittedly this workaround can quickly become more of a maintenance nightmare the more object types the application needs to potentially deserialize.


There is also another workaround that's avoiding the need for warming up object contracts and that is simpler to implement but coming at the cost of some performance penalty per deserialization thread.

The object contracts used for each (de)serialization job are created -- and more importantly here -- cached by the DefaultContractResolver instance utilized by default by the (de)serializer. Different DefaultContractResolver instances however do not share contract instances between each other - each DefaultContractResolver instance creates and caches their own contract instances.

We can exploit this fact by having each deserialization thread set up its own JsonSerializerSettings with a new DefaultContractResolver instance that is then being passed to every (de)serializion job called in that thread, e.g.:

var timer1 = new Timer(state =>
  {
    ...
    try
    {
      var settings = new JsonSerializerSettings
      {
          ContractResolver = new DefaultContractResolver()
      };
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": """" }", settings); // This does not throw.
    }
    catch (Exception e)
    ...

and

var timer2 = new Timer(state =>
  {
    ...
    try
    {
      var settings = new JsonSerializerSettings
      {
          ContractResolver = new DefaultContractResolver()
      };
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": null }", settings); // This must throw.
    }
    catch (Exception e)
    ...

The effect is that the different deserialization threads don't operate with the same contract instances, thus avoiding contract-related multi-threading issues.

However, every deserialization thread now using its own DefaultContractResolver instance leads to every deserialization thread going through the motions of reflecting over the data type(s) to be deserialized and creating the respective contracts. Which was once a one-time performance cost is now a performance cost every deserialization thread has to pay again.

(There is also the JsonConvert.DefaultSettings property that accepts a function delegate that is returning a JsonSerializerSettings instance -- so it's actually more of a "DefaultSettingsFactory" property --, but this not only applies to deserialization methods offered by JsonConvert but also to its serialization methods as well as to any JsonSerializer instance that will be created with default settings. Keep that in mind when you contemplate whether you want to make use of the JsonConvert.DefaultSettings property for this workaround or not.)

antispam2002 commented 1 year ago

Hi @elgonzo,

thanks for the quick analysis & feedback. The sample is based on the real life scenario, when a contract is tested for correct handling of invalid messages which all arrive in parallel. BTW, the same would be true for any REST API running multi-threaded when the multiple calls using the same contract arrive after an application restart in a concurrent bunch.

I'll check how/if we can integrate the suggested workaround, but, as you say, as there are much more contracts to be de-serialized, the maintenance would become a nightmare.

elgonzo commented 1 year ago

In fact, thinking of a workaround, I guess the warm-up payload is actually irrelevant. It can be an empty string [...]

Using an empty string as json data to deserialize might perhaps cause a deserialization error before the HasRequiredOrDefaultValueProperties property of the object contract would be accessed, thus not accomplishing the warm-up. Using a string with an empty json object "{}" however could perhaps work. But i haven't really tested with an empty string or empty json object myself, so don't rely on my assumption here...

elgonzo commented 1 year ago

There is also another workaround that's simpler to implement, but comes at the cost of some performance penalty per deserialization thread. (I should have thought about that sooner, sorry about that. I also copied the information from this comment to my first comment above to keep the relevant information in one place.)

The object contracts used for each (de)serialization job are created -- and more importantly here -- cached by the DefaultContractResolver instance utilized by default by the (de)serializer. Different DefaultContractResolver instances however do not share contract instances between each other - each DefaultContractResolver instance creates and caches their own contract instances.

We can exploit this fact by having each deserialization thread set up its own JsonSerializerSettings with a new DefaultContractResolver instance, i.e.:

var timer1 = new Timer(state =>
  {
    ...
    try
    {
      var settings = new JsonSerializerSettings
      {
          ContractResolver = new DefaultContractResolver()
      };
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": """" }", settings); // This does not throw.
    }
    catch (Exception e)
    ...

and

var timer2 = new Timer(state =>
  {
    ...
    try
    {
      var settings = new JsonSerializerSettings
      {
          ContractResolver = new DefaultContractResolver()
      };
      JsonConvert.DeserializeObject<Payload>(@"{ ""eventType"": null }", settings); // This must throw.
    }
    catch (Exception e)
    ...

The effect is that the different deserialization threads don't operate with the same contract instances, thus avoiding contract-related multi-threading issues.

However, every deserialization thread now using its own DefaultContractResolver instance leads to every deserialization thread going through the motions of reflecting over the data type(s) to be deserialized and creating the respective contracts. Which was once a one-time performance cost is now a performance cost every deserialization thread has to pay again.

There is also the JsonConvert.DefaultSettings property that accepts a function delegate that is returning a JsonSerializerSettings instance (so it's actually more of a "DefaultSettingsFactory" property), but this not only applies to deserialization methods offered by JsonConvert, but also to its serialization methods as well as to any JsonSerializer instance that will be created with default settings. Keep that in mind when you contemplate whether you want to make use of the JsonConvert.DefaultSettings property for this workaround or not.

antispam2002 commented 1 year ago

Hi @elgonzo

thanks again for all the efforts :) I tried the workaround with an empty payload {} & an attribute to mark the contracts, seems to work like a charm (e.g. https://github.com/antispam2002/test-newtonsoft-issue/blob/9e5df95495d501f5e8e10ac6c47b06ade23aeb47/Program.cs). As you mentioned, the use of an own instance of DefaultContractResolver() every time before the de-serialization seems to be an overkill provided that such a new instance must be created every time when the de-serializer is called. So I tried to stick to a bit more work-intensive but less performance-affecting workaround. Will comment again if it worked in the real life :)

What would be your expectation - when the fix can become available in the release version of the library?

elgonzo commented 1 year ago

What would be your expectation - when the fix can become available in the release version of the library?

Unfortunately, i can't tell if and when a fix would become available, as i am not associated with the Newtonsoft.Json project nor its author and maintainer. :-(