dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Internal json converters cannot function independently of their original options #50205

Open NinoFloris opened 3 years ago

NinoFloris commented 3 years ago

Internal json collection converters for instance assume JsonClassInfo.ElementInfo is not null, however it can actually be null if the converter has been pre-aquired from another options instance. https://github.com/dotnet/runtime/blob/79ae74f5ca5c8a6fe3a48935e85bd7374959c570/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Collection/IEnumerableDefaultConverter.cs#L42

ElementInfo returns null if ElementType is null and ElementType will only be filled under specific circumstances.

When we start at the point of our custom converter's Read() method (see below) we get the following steps to an NRE:

using System;
using System.Collections.Immutable;
using System.Text.Json;
using System.Text.Json.Serialization;

namespace StjRepro
{
    public enum Cases
    {
        One,
        Two,
        Three,
        Four
    }

    class JsonImmutableArrayConverter<T> : JsonConverter<ImmutableArray<T>>
    {
        JsonConverter<ImmutableArray<T>> _originalConverter;

        public JsonImmutableArrayConverter()
            => _originalConverter = (JsonConverter<ImmutableArray<T>>)new JsonSerializerOptions().GetConverter(typeof(ImmutableArray<T>));

        public override ImmutableArray<T> Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
            // We're passing the current options and not the default options because we do want our value converters to work
            // in this example, from strings to the enum 'Cases'.
            => _originalConverter.Read(ref reader, typeToConvert, options);

        public override void Write(Utf8JsonWriter writer, ImmutableArray<T> value, JsonSerializerOptions options)
        {
            throw new NotSupportedException();
        }
    }

    class JsonImmutableArrayConverter : JsonConverterFactory
    {
        public override bool CanConvert(Type typeToConvert)
            => typeToConvert.IsGenericType && typeToConvert.GetGenericTypeDefinition() == typeof(ImmutableArray<>);

        public override JsonConverter CreateConverter(Type typeToConvert, JsonSerializerOptions options)
            => (JsonConverter)Activator.CreateInstance(typeof(JsonImmutableArrayConverter<>).MakeGenericType(typeToConvert.GenericTypeArguments[0]));
    }

    class Program
    {
        static void Main(string[] args)
        {
            var options = new JsonSerializerOptions();
            options.Converters.Add(new JsonImmutableArrayConverter());
            options.Converters.Add(new JsonStringEnumConverter(JsonNamingPolicy.CamelCase));
            JsonSerializer.Deserialize<ImmutableArray<Cases>>(@"[""one"",""two"",""three"",""four""]", options);
        }
    }
}
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Text.Json.Serialization.Converters.IEnumerableDefaultConverter`2.GetElementConverter(JsonClassInfo elementClassInfo)
   at System.Text.Json.Serialization.Converters.IEnumerableDefaultConverter`2.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, TCollection& value)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonResumableConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at StjRepro.JsonImmutableArrayConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonConverter jsonConverter, Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](Utf8JsonReader& reader, Type returnType, JsonSerializerOptions options)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](ReadOnlySpan`1 json, Type returnType, JsonSerializerOptions options)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options)
   at StjRepro.Program.Main(String[] args)

The reason for pulling out an 'original' converter like this is because some code paths in a custom converter should just be able to default to existing converters, merely wrapping over them.

The only way I see to hack around this without a proper fix is passing a specially crafted options instance into the framework converter that has this custom converter removed, it will resolve the correct JsonClassInfo and includes the required custom value converters but in terms of perf (and usability) it seems far from ideal.

        public override ImmutableArray<T> Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        {
            var optionsDiff = new JsonSerializerOptions(options);
            JsonConverter factory = null;
            foreach (var converter in optionsDiff.Converters)
            {
                if (converter.GetType() == typeof(JsonImmutableArrayConverter))
                    factory = converter;
            }
            if (factory != null)
                optionsDiff.Converters.Remove(factory);

            // We're passing the current options and not the default options because we do want our value converters to work
            // in this case from strings to the enum 'Cases'.
            return _originalConverter.Read(ref reader, typeToConvert, optionsDiff);
        }

If there is some other method by which to achieve what I want, I'd be glad to use it. In any case I think its good to add some documentation on how to call into the framework converters from a custom converter, this has been something I've wanted to do (and done to various degrees of success) multiple times now.

/cc @layomia

ghost commented 3 years ago

Tagging subscribers to this area: @eiriktsarpalis, @layomia See info in area-owners.md if you want to be subscribed.

Issue Details
Internal json collection converters for instance assume `JsonClassInfo.ElementInfo` is not null, however it can actually be null if the converter has been pre-aquired from another options instance. https://github.com/dotnet/runtime/blob/79ae74f5ca5c8a6fe3a48935e85bd7374959c570/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Collection/IEnumerableDefaultConverter.cs#L42 `ElementInfo` returns null if `ElementType` is null and `ElementType` will only be filled under specific circumstances. When we start at the point of our custom converter's Read() method (see below) we get the following steps to an NRE: - pre-aquired JsonConverter.Read will initialize its own state (readstack) with among others a call to `options.GetOrAddClassForRootType(type)` - This will then do a resolve for the type and its converter in the constructor of `JsonClassInfo(type)`. - Converter, which is resolved back again to the custom converter here will have `ClassType.None` because all custom converters do. - State is now initialized with `ElementType = null` because it didn't fall into the `ClassType.Collection` arm - Call to OnTryRead commences with state that will never return anything but null for `JsonClassInfo.ElementInfo` - NRE thrown ```cs using System; using System.Collections.Immutable; using System.Text.Json; using System.Text.Json.Serialization; namespace StjRepro { public enum Cases { One, Two, Three, Four } class JsonImmutableArrayConverter : JsonConverter> { JsonConverter> _originalConverter; public JsonImmutableArrayConverter() => _originalConverter = (JsonConverter>)new JsonSerializerOptions().GetConverter(typeof(ImmutableArray)); public override ImmutableArray Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) // We're passing the current options and not the default options because we do want our value converters to work // in this example, from strings to the enum 'Cases'. => _originalConverter.Read(ref reader, typeToConvert, options); public override void Write(Utf8JsonWriter writer, ImmutableArray value, JsonSerializerOptions options) { throw new NotSupportedException(); } } class JsonImmutableArrayConverter : JsonConverterFactory { public override bool CanConvert(Type typeToConvert) => typeToConvert.IsGenericType && typeToConvert.GetGenericTypeDefinition() == typeof(ImmutableArray<>); public override JsonConverter CreateConverter(Type typeToConvert, JsonSerializerOptions options) => (JsonConverter)Activator.CreateInstance(typeof(JsonImmutableArrayConverter<>).MakeGenericType(typeToConvert.GenericTypeArguments[0])); } class Program { static void Main(string[] args) { var options = new JsonSerializerOptions(); options.Converters.Add(new JsonImmutableArrayConverter()); options.Converters.Add(new JsonStringEnumConverter(JsonNamingPolicy.CamelCase)); JsonSerializer.Deserialize>(@"[""one"",""two"",""three"",""four""]", options); } } } ``` ``` Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object. at System.Text.Json.Serialization.Converters.IEnumerableDefaultConverter`2.GetElementConverter(JsonClassInfo elementClassInfo) at System.Text.Json.Serialization.Converters.IEnumerableDefaultConverter`2.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, TCollection& value) at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value) at System.Text.Json.Serialization.JsonResumableConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options) at StjRepro.JsonImmutableArrayConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options) at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value) at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state) at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonConverter jsonConverter, Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state) at System.Text.Json.JsonSerializer.ReadCore[TValue](Utf8JsonReader& reader, Type returnType, JsonSerializerOptions options) at System.Text.Json.JsonSerializer.Deserialize[TValue](ReadOnlySpan`1 json, Type returnType, JsonSerializerOptions options) at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options) at StjRepro.Program.Main(String[] args) ``` The reason for pulling out an 'original' converter like this is because some code paths in a custom converter should just be able to default to existing converters, merely wrapping over them. The only way I see to hack around this without a proper fix is passing a specially crafted options instance into the framework converter that has this custom converter removed, it will resolve the correct `JsonClassInfo` and includes the required custom value converters but in terms of perf (and usability) it seems far from ideal. ```cs public override ImmutableArray Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) { var optionsDiff = new JsonSerializerOptions(options); JsonConverter factory = null; foreach (var converter in optionsDiff.Converters) { if (converter.GetType() == typeof(JsonImmutableArrayConverter)) factory = converter; } if (factory != null) optionsDiff.Converters.Remove(factory); // We're passing the current options and not the default options because we do want our value converters to work // in this case from strings to the enum 'Cases'. return _originalConverter.Read(ref reader, typeToConvert, optionsDiff); } ``` If there is some other method by which to achieve what I want, I'd be glad to use it. In any case I think its good to add some documentation on how to call into the framework converters from a custom converter, this has been something I've wanted to do (and done to various degrees of success) multiple times now. /cc @layomia
Author: NinoFloris
Assignees: -
Labels: `area-System.Text.Json`, `untriaged`
Milestone: -
layomia commented 3 years ago

@NinoFloris thanks! I haven't fully grokked this yet but @eiriktsarpalis and I discussed potential issues related to this just yesterday. I'll take a look.

eiriktsarpalis commented 3 years ago

@NinoFloris I bumped into the very same issue yesterday and agree with your analysis. I couldn't really work around it without making changes to System.Text.Json internals so we need to fix the issue in the general case.

I don't think relying on a "default" options instance is a good solution, since while it may (in some cases) ensure that JsonClassInfo.ElementClassInfo is populated, that might not be compatible with the element converter being used by the active options instance.

Fundamentally this stems from weakness in the JsonClassInfo design, which is assumed to be converter-agnostic but whose shape depends on the current converter being used. I don't know what the best solution would be here, but it will probably require a bit of refactoring :-)

NinoFloris commented 3 years ago

@NinoFloris I bumped into the very same issue yesterday and agree with your analysis

That's such an unlikely coincidence, amusing! ^_^

I don't think relying on a "default" options instance is a good solution

Evidently it's a hack to get what I want, though I'm not sure what your suggested alternative is? The converter override and caching process won't suddenly give me the behavior I want — getting the converter that would have been resolved if not for the custom one overriding it and getting cached — without api additions.

To 'quickly' fix the JsonClassInfo issue, I've considered a few options:

Obviously when going down the composite key route the slower lookup will have some unknown effect on overall perf — I'm expecting tiny? — which still needs careful analysis.

If we expect it to be unfruitful it might make sense to keep most as-is and to push resolution of these differences into JsonClassInfo itself, have it function uniformly regardless of ClassType, say starting at a certain ClassType and synthesizing data for other ClassTypes on demand, not quite sure what this would mean for JsonClassInfo.ClassType and its uses though.

Anything else seems like it would require a larger overhaul of the internals, do you have any suggestions?

Either way thanks for the quick reply!

layomia commented 3 years ago

A workaround here based your original approach is to create and cache a separate options instance, populated with the JsonStringEnumConverter, for the original/default ImmutableArray<T> converter to use (see https://dotnetfiddle.net/tbnB9f for full program):

class JsonImmutableArrayConverter<T> : JsonConverter<ImmutableArray<T>>
{
    JsonSerializerOptions _originalOptions;
    JsonConverter<ImmutableArray<T>> _originalConverter;

    public JsonImmutableArrayConverter()
    {
    _originalOptions = new JsonSerializerOptions()
    {
            Converters = { new JsonStringEnumConverter(JsonNamingPolicy.CamelCase) }
    };
    _originalConverter = (JsonConverter<ImmutableArray<T>>)_originalOptions.GetConverter(typeof(ImmutableArray<T>));
    }

    public override ImmutableArray<T> Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        => _originalConverter.Read(ref reader, typeToConvert, _originalOptions);

    public override void Write(Utf8JsonWriter writer, ImmutableArray<T> value, JsonSerializerOptions options)
    {
        _originalConverter.Write(writer, value, _originalOptions);
    }
}

Then, the enum converter doesn't need to be specified at the root call to the serializer:

static void Main(string[] args)
{
    var options = new JsonSerializerOptions();
    options.Converters.Add(new JsonImmutableArrayConverter());
    //options.Converters.Add(new JsonStringEnumConverter(JsonNamingPolicy.CamelCase));

    ImmutableArray<Cases> deserialized = JsonSerializer.Deserialize<ImmutableArray<Cases>>(@"[""one"",""two"",""three"",""four""]", options);

    string serialized = JsonSerializer.Serialize(deserialized, options);
    Console.WriteLine(serialized);
}

Still investigating the issue here, but generally, multiple options instances shouldn't be passed arbitrarily as inputs to converters or the serializer. The mitigation here might be for the serializer to detect and guard against unsupported options patterns, and throw meaningful exceptions.

In any case I think its good to add some documentation on how to call into the framework converters from a custom converter, this has been something I've wanted to do (and done to various degrees of success) multiple times now.

Yes, we'll provide some documentation for this. We can add it to this page - https://docs.microsoft.com/dotnet/standard/serialization/system-text-json-converters-how-to?pivots=dotnet-5-0#error-handling. cc @tdykstra

layomia commented 3 years ago

The reason for pulling out an 'original' converter like this is because some code paths in a custom converter should just be able to default to existing converters, merely wrapping over them.

In principle, I do agree that custom converters should be able to compose nicely with framework/STJ internal converters. I'm just curious about this specific scenario. What motivated the use of a custom converter here? Why not just:

static void Main(string[] args)
{
    var options = new JsonSerializerOptions();
    options.Converters.Add(new JsonStringEnumConverter(JsonNamingPolicy.CamelCase));

    ImmutableArray<Cases> deserialized = JsonSerializer.Deserialize<ImmutableArray<Cases>>(@"[""one"",""two"",""three"",""four""]", options);
    Console.WriteLine(JsonSerializer.Serialize(deserialized, options));
}
NinoFloris commented 3 years ago

@layomia obviously this is not a tenable solution, the ImmutableArrayConverter should not have to know about all Ts (and the right type of converter) for the entire application. It's also common enough to create converters that could get shipped as a package like this https://github.com/Tarmil/FSharp.SystemTextJson.

The mitigation here might be for the serializer to detect and guard against unsupported options patterns, and throw meaningful exceptions.

Proper compositionality is important, doing the easy thing here seems like it undermines the custom converter model of STJ a lot (and it's already — no doubt with good intentions — much more limited than the internal converters).

What motivated the use of a custom converter here? Why not just:

ImmutableArray is slow to serialize due to it falling into the IEnumerable path, yet as we like immutability we have many instances where we use it, so we have a converter that optimizes writes (via GetMemory and MemoryMarshal.TryGetArray which is entirely safe), but leaves reads as-is. I've also had cases around discriminated unions, custom collections or other functorial types where I just want to fall back to the framework converters when some pattern match or condition is hit, most of the times this is in the read path, as going from a reader to arbitrary .net objects is a pain I try to avoid (at least without the public metadata apis that were talked about).

Anyway, I have at least somewhat optimized the 'hacky fix' I showed in my original comment by caching the diffed options on an instance field. We can then do an Object.ReferenceEquals on it against the options passed into Read. If they don't match we diff the passed in instance but otherwise we can move on straight away.

layomia commented 3 years ago

The value converter model based on directly deriving from JsonConverter<T> was really intended for simple/primitive types. The limitation with the JsonConverter<T>.Read method is that it doesn't have a ReadStack state parameter to properly flow type metadata (same issue with writing). We've talked a few times about a richer converter model for objects and collections which is based on passing state around (e.g. https://github.com/dotnet/runtime/pull/2259). This is the ideal direction for composability, and it would be great to get to this soon.

A couple of workarounds have been mentioned so I hope for now your scenario is unblocked. In the meantime, we're evaluating adding a safe-guard in the code that bridges the JsonConverter<T>.Read method call with the internal logic for objects/collections to ensure that the state metadata we initialized is correct:

https://github.com/dotnet/runtime/blob/e503be0f9c37611bc03363751c67e660cd60b26a/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonResumableConverterOfT.cs#L23-L24

It may look like this:

ReadStack state = default;
state.Initialize(typeToConvert, options, supportContinuation: false);
if (state.Current.JsonPropertyInfo.ConverterBase != this)
{
    throw new InvalidOperationException();
}
TryRead(ref reader, typeToConvert, options, ref state, out T? value);

The original repro would fail with this exception. This exception may help uncover more interesting patterns/dependencies that our callers have.

cc @steveharter


We'll look into improving the performance of (de)serializing immutable collections. They are currently based on the CreateRange pattern which involves allocating a temporary collection. Using the builder pattern has better perf (https://dotnetfiddle.net/d5n55f).

NinoFloris commented 3 years ago

Thanks!

We've talked a few times about a richer converter model for objects and collections which is based on passing state around

I'll be sure to subscribe to #36785 :)

We'll look into improving the performance of (de)serializing immutable collections. They are currently based on the CreateRange pattern which involves allocating a temporary collection. Using the builder pattern has better perf (https://dotnetfiddle.net/d5n55f).

Nice to see some love for deserializing into immutable collections! For us writing them fast is important as we have http read heavy workloads. Hopefully you could look into using MemoryMarshal as well for ImmutableArray. This is 100% binary compatible as long as you error out or fall back to IEnumerable when TryGetArray returns false. In the highly unlikely event ImmutableArray suddenly starts using native memory ;)

layomia commented 3 years ago

We don't have work planned to address this for .NET 6.0, we should consider in 7.

eiriktsarpalis commented 2 years ago

One possible fix is to have the internal converters encapsulate their corresponding JsonTypeInfo metadata. We don't do it currently since converters are meant to be decoupled from JsonSerializerOptions and any metadata it may carry. Our built-in converters however are special, since they are instantiated using internal converter factories. As such, every instance is tightly coupled to the JsonSerializerOptions instance that originated it, and this bug is a consequence of us not observing this invariant.

Tagging @krwq who might be interested in this topic.

eiriktsarpalis commented 2 years ago

We won't have time to work on this for .NET 7, moving to Future.

krwq commented 1 year ago

We should tackle this in conjunction with https://github.com/dotnet/runtime/issues/63791 and https://github.com/dotnet/runtime/issues/54189

gregsdennis commented 7 months ago

As part of a recent AOT-compatibility update, Json.More.Net now includes several JsonSerializerOptions extensions that help working around this issue.