JamesNK / Newtonsoft.Json

Json.NET is a popular high-performance JSON framework for .NET
https://www.newtonsoft.com/json
MIT License
10.76k stars 3.25k forks source link

NullReferenceException in JsonSerializer.GetMatchingConverter(IList<JsonConverter> converters, Type objectType) #2963

Closed RinaAndersHansen closed 3 months ago

RinaAndersHansen commented 3 months ago

Source/destination types

public class FormObject
{
    public Guid FormTemplateGuid { get; set; }
    public Guid UnitGuid { get; set; }
    public string FormTemplateNo { get; set; }
    public string FormTemplateName { get; set; }
    public string FormTemplateHelpDescription { get; set; }
    public List<FormPage> Pages { get; set; }
    public List<FormField> Fields { get; set; }
    public List<CounterFormInfo> CounterFormInfos { get; set; }
}

 public class FormPage
 {
     public int Page { get; set; }
     public string Name { get; set; }
}

 public class FormField
 {
     public string FieldName { get; set; }
     public SystemData.FieldFormFieldsTypeColumnTypes? FieldType { get; set; }
     public int? SortOrder { get; set; }
     public bool? Mandatory { get; set; }
     public bool? Timestamp { get; set; }
     public DateTime? UpdateDate { get; set; }
     public Guid? UpdateUserGuid { get; set; }
     public string Caption { get; set; }
     public string Url { get; set; }
     public string MeasureUnit { get; set; }
     public FormTemplateTableJSON FormTemplateTableJSON { get; set; }
     public FormFieldDocument Signature { get; set; }
     public FormFieldDocument HelpImage { get; set; }
     public int? HelpImageSize { get; set; }
     public string PropertyName { get; set; }
}

Source/destination JSON

{
  "FormTemplateGuid": "be992c27-943e-404f-8864-951b50675a59",
  "UnitGuid": "00000000-0000-0000-0000-000000000000",
  "FormTemplateNo": "1",
  "FormTemplateName": "Maintenance and repair",
  "Fields": [
    {
      "FieldName": "18b15b2c-1ece-fe7c-8115-f9120afa4a29",
      "FieldType": "DateTimePicker",
      "SortOrder": 1,
      "Mandatory": true,
      "Caption": "Date Chooser",
      "Tag": "FormReportDate",
      "ValueType": "System.DateTime",
      "Column": 1,
      "Row": 1,
      "Page": 1,
      "FormTemplateTableGuid": "",
      "IsDateTimeTag": true,
      "IsNumberTag": false,
      "PropertyName": "datachooser"
    },
  "CounterFormInfos": []
}

Expected behavior

We run the code in a Net 6.0 API. In the code we call JsonConvert.DeserializeObject(json, JSONSettings like this

var obj = JsonConvert.DeserializeObject<SerticaFormObject>(json, JSONSettings);

JSONSettings is defined like this.

 _jsonSettings = new JsonSerializerSettings();
 _jsonSettings.Formatting = Newtonsoft.Json.Formatting.None;
 _jsonSettings.NullValueHandling = NullValueHandling.Ignore;
 _jsonSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
 _jsonSettings.FloatParseHandling = FloatParseHandling.Decimal;

This works as a charm, and has for six years. And still do.

But... off to actual behavior...

Actual behavior

Yesterday a customer reported an error that our API failed on GET'ing any of the Form objects out of the API. We tracked it down to this stacktrace in Sentry.

image

Basically, inside JsonConvert.DeserializeJson it just throws a Null Reference Exception.

We looked at it for an hour or so, reproducing the same setup locally on a dev machine, with exact same Json, and everything was fine.

Restarted the API for the customer as a last resort, and everything was fine again.

We had two similar requests hitting the API at exactly the same time, when the issue started.

Is it a possibility that JsonConver.DeserializeObject, being a static function, caches a list of known/previously used converters, and that generating a converter for the same Type in two different requests/threads simultanously can cause some threading issues?

We can wrap this line of code in a lock statement - but that would just be a symptom treatment - and all other Types we serialize back and forth handled just by the .Net API framework. And effectively make our API slower, as it is only the one time where JsonSerializer needs a new converter we need the lock.

As mentioned, we have had this for many years running in production, and never experienced this before.

Steps to reproduce

Sadly I have not much more to go with. Tried to reproduce in a unittest starting 100 Tasks, trying to get them to synchronize the call to JsonConvert - but had no failures there. The error is gone in production for now, and has no more than the stacktrace above from various log sources. If it is a timing incident, we might go another 5-10 years before seeing this again.

I am mainly curious to know if you think it could be a threading issue inside JsonConvert/JsonSerializer and friends, or if this is out of the question.

If you know any scenarios that can cause this, I can look around in our (huge) code base, and see if we have done something bad elsewhere.

elgonzo commented 3 months ago

When exactly in the control flow of your program is the code

_jsonSettings = new JsonSerializerSettings();
 _jsonSettings.Formatting = Newtonsoft.Json.Formatting.None;
 _jsonSettings.NullValueHandling = NullValueHandling.Ignore;
 _jsonSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
 _jsonSettings.FloatParseHandling = FloatParseHandling.Decimal;

being executed, and how exactly is the link between the JSONSettings variable/property used in var obj = JsonConvert.DeserializeObject<SerticaFormObject>(json, JSONSettings); and the _jsonSettings variable/field realized?


Because depending on how you implemented this, there is the potential for a race condition regarding the _jsonSettings.Converters collection. Let me give an example of how the right/wrong implementation of a (hypothetical) JSONSettings property used in concurrent executions of JsonConvert.DeserializeObject<SerticaFormObject>(json, JSONSettings) could lead to a race condition that would result in a NRE in the same way you observed.

For the race condition to occur, it is required that the JsonSerializerSettings instance is initialized on demand and made available to serialization tasks before setting up the _jsonSettings.Converters collection completes.

An example of a JSONSettings property that would expose this kind of behavior would be:

public JsonSerializerSettings JSONSettings
{
     get
    {
        if (_jsonSettings == null)
        {
            _jsonSettings = new JsonSerializerSettings();
            ...
            _jsonSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
            ...
        }
        return _jsonSettings;
    }
}

In this example, note how the JsonSerializerSettings instance becomes accessible through the JSONSettings property as soon as it is assigned to _jsonSettings even though it isn't completely configured yet.

To understand how the race condition materializes, know that:

This opens the door to the following possible chain of events leading to a NRE as you observed:

  1. Thread 1 wants to execute JsonConvert.DeserializeObject<...>(..., JSONSettings), but first has to execute the JSONSettings getter to get the JsonSerializerSettings instance from it.
  2. Thread 1 enters the JSONSettings getter, tests for _jsonSettings being null, enters the if block and completes execution of _jsonSettings = new JsonSerializerSettings()
  3. Now thread 2 wants to execute JsonConvert.DeserializeObject<...>(..., JSONSettings), but it also first has to execute the JSONSettings getter to get the JsonSerializerSettings instance from it.
  4. Thread 2 enters the JSONSettings getter, the if expression "sees" that _jsonSettings contains the JsonSerializerSettings instance created by thread 1 and returns this instance.
  5. Thread 2 is able to invoke JsonConvert.DeserializeObject<...>(..., JSONSettings) now, leading it to begin execution of the ApplySerializerSettings method shown above
  6. Thread 1, having a bad day and being slow and constantly being suspended or interrupted for this or that reason has so far only managed to execute the JSONSettings getter to the point where it is now starting to add converters to _jsonSettings.Converters
  7. Due to the implementation of List<T>.Add, thread 1 increases the _jsonSettings.Converters list size/count to 1 but the converter is not yet added to the list's backing array (thus the respective slot in that array still being the default value null at this point).
  8. Thread 2, being a wily quick fox, executes CollectionUtils.IsNullOrEmpty(settings.Converters)) in the ApplySerializerSettings method, which sees that the _jsonSettings..Converters count is 1 and therefore returns true.
  9. Thread 2 now happily starts copying items from the _jsonSettings.Converters list into the serializer's own converters list.
  10. Thread 1, having really the worst day of its life, still hasn't come around placing the added converter into the _jsonSettings.Converters backing array yet (probably been busy with having to grow/resize/reallocate the backing array)
  11. Thread 2 attempts to get the 1st item from the _jsonSettings.Converters list, but the respective slot in its backing array is still null. Thread 2 gets that null value from _jsonSettings.Converters and places it into the serializer's own converter list. KABOOM!

Note that a NRE is not the only failure mode of this race condition. The same race could also lead to an IndexOutOfRangeException in case thread 2 tries to access the list's backing array before it gets resized/reallocated by thread 1 and the used array index happens to be out of bounds for the old and outdated but still active backing array.

The chances for this happening are slim, as the time window for this to occur is very narrow, and the runtime behavior of the CPU cores executing these threads needs to differ in just the "right" way for this narrow time window to open. This could explain why you see this happening only exceedingly rarely and why it would be exceedingly difficult to reliably reproduce the problem.


Please remember that the example i gave is just an illustrative example of how some innocuously looking code could lay the foundations of a race condition that is not obvious to see. I do not claim to know what your code with respect to this JSONSettings variable/property is. There might very well something else going on in your code.

Also, i am not associated or related to the Newtonsoft.Json project and its author/maintainer. I am just a fellow user of the library (or perhaps former user, as these days it's pretty much always STJ over Newtonsoft.Json)

RinaAndersHansen commented 3 months ago

That sounds reasonable.

Thumbs up for a thorough answer, and for seeing the potential even though I left out the important null check.

The property for the SerializerSettings is a static method, that two threads potentially can race. And exactly as you write.

In code they are a bit further apart, and part of something bigger, which was why I pasted as I did.

[Browsable(false)]
[JsonIgnore]
public static JsonSerializerSettings JSONSettings
{
    get
    {
        if (_jsonSettings == null)
        {
            _jsonSettings = new JsonSerializerSettings();
            _jsonSettings.Formatting = Newtonsoft.Json.Formatting.None;
            _jsonSettings.NullValueHandling = NullValueHandling.Ignore;
            _jsonSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
            _jsonSettings.FloatParseHandling = FloatParseHandling.Decimal;
        }
        return _jsonSettings;
    }
}

But your point is definitely valid, though hard to reproduce as you state.

For reference, we will fix it on our side by putting a lock statement around the getter, so we are sure that only one thread is initializing this at a time. Costing only a fraction of time the first time someone needs this in the entire API lifetime.

private static object jsonSettingslock = new object();

[ [Browsable(false)]
 [JsonIgnore]
 public static JsonSerializerSettings JSONSettings
 {
     get
     {
         if (_jsonSettings == null)
         {
             lock(jsonSettingslock)
             {
                 if (_jsonSettings == null)
                 {
                     _jsonSettings = new JsonSerializerSettings();
                     _jsonSettings.Formatting = Newtonsoft.Json.Formatting.None;
                     _jsonSettings.NullValueHandling = NullValueHandling.Ignore;
                     _jsonSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
                     _jsonSettings.FloatParseHandling = FloatParseHandling.Decimal;
                 }
             }
         }
         return _jsonSettings;
     }
 }

 public static FormObject DeSerialize(string json)
 {
     var obj = JsonConvert.DeserializeObject<FormObject>(json, JSONSettings);
     obj.Fields.ForEach(x => x.ParentObject = obj);
     return obj;
 }

Once again, thank you very much for the rapid and thorough answer.

RinaAndersHansen commented 3 months ago

Closing issue, as it seems resolved for my part.

elgonzo commented 3 months ago

No, the lock implementation as shown in your last comment does NOT prevent the race condition.

Note that any second thread will not enter the lock when it sees _jsonSettings not being null anymore. Which means, the race is still on, and the second thread can access the JsonSerializerSettings instance in _jsonSettings even though _jsonSettings is not yet fully configured by the first thread executing the locked section.

I would suggest to not use lock in this case, and instead use something like Lazy<T>:

public static JsonSerializerSettings JSONSettings => _jsonSettings.Value;

private static readonly Lazy<JsonSerializerSettings> _jsonSettings = new Lazy<JsonSerializerSettings>(
    () => {
        var settings = new JsonSerializerSettings();
        settings.Formatting = Newtonsoft.Json.Formatting.None;
        settings.NullValueHandling = NullValueHandling.Ignore;
        settings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
        settings.FloatParseHandling = FloatParseHandling.Decimal;
        return settings;
    }
);
RinaAndersHansen commented 3 months ago

You are absolutely right. I chose lock because that is the pattern that our solution uses other places. And is well known to all the other developers.

I should obviously wait assigning the private backing variable until done with initializing.

private static object jsonSettingslock = new object();

[Browsable(false)]
[JsonIgnore]
public static JsonSerializerSettings JSONSettings
{
    get
    {
        if (_jsonSettings == null)
        {
            lock(jsonSettingslock)
            {
                if (_jsonSettings == null)
                {
                    var newSettings = new JsonSerializerSettings();
                    newSettings.Formatting = Newtonsoft.Json.Formatting.None;
                    newSettings.NullValueHandling = NullValueHandling.Ignore;
                    newSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
                    newSettings.FloatParseHandling = FloatParseHandling.Decimal;

                    _jsonSettings = newSettings;
                }
            }
        }
        return _jsonSettings;
    }
}

I will create an internal issue here to get it fixed and reviewed, Will take a look on Lazy and see if that is a better fit.

elgonzo commented 3 months ago

I should obviously wait assigning the private backing variable until done with initializing.

That alone would not be sufficient to avoid the race condition for certain. You would also need to make the field _jsonSettings volatile, otherwise the JIT compiler and/or the CPU are allowed to reorder the execution, for example to something like this:

var newSettings = new JsonSerializerSettings();
_jsonSettings = newSettings;
newSettings.Formatting = Newtonsoft.Json.Formatting.None;
newSettings.NullValueHandling = NullValueHandling.Ignore;
newSettings.Converters.Add(new Newtonsoft.Json.Converters.StringEnumConverter());
newSettings.FloatParseHandling = FloatParseHandling.Decimal;

Not sure if that is going to happen or not for some given CLR/JIT version and/or CPU arch/family, but this kind of reordering is permissible by the C# language standard (7.10 Execution order)

RinaAndersHansen commented 3 months ago

I have replaced the logic with Lazy. Found four other places in other modules that has similar flaws, so those are changed to Lazy as well.

So now for the manual testing, and off to a colleague to verify that everything else is still working.

Thanks for the explanations and assistance once again.