dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

[API Proposal]: Add a `JsonUnknownTypeHandling` setting for deserializing `object` to .NET primitive values #98038

Open eiriktsarpalis opened 8 months ago

eiriktsarpalis commented 8 months ago

Background and motivation

Branching off from the conversation in #29960 and #97801 to consider a potential built-in object deserializer that targets .NET primitive values as opposed to targeting the DOM types: JsonNode or JsonElement. The background is enabling users migrating off of Json.NET needing a quick way to support object deserialization, provided that the deserialized object is "simple enough". This approach is known to create problems w.r.t. loss of fidelity when roundtripping, which is why it was explicitly ruled out when STJ was initially being designed. It is still something we might want to consider as an opt-in accelerator for users that do depend on that behaviour.

This proposal would map JSON to .NET types using the following recursive schema:

Here's a reference implementation of the above:

public class NaturalObjectConverter : JsonConverter<object>
{
    public override object? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        => ReadObjectCore(ref reader);

    public override void Write(Utf8JsonWriter writer, object value, JsonSerializerOptions options)
    {
        Type runtimeType = value.GetType();
        if (runtimeType == typeof(object))
        {
            writer.WriteStartObject();
            writer.WriteEndObject();
        }
        else
        {
            JsonSerializer.Serialize(writer, value, runtimeType, options);
        }
    }

    private static object? ReadObjectCore(ref Utf8JsonReader reader)
    {
        switch (reader.TokenType)
        {
            case JsonTokenType.Null:
                return null;

            case JsonTokenType.False or JsonTokenType.True:
                return reader.GetBoolean();

            case JsonTokenType.Number:
                if (reader.TryGetInt32(out int intValue))
                {
                    return intValue;
                }
                if (reader.TryGetInt64(out long longValue))
                {
                    return longValue;
                }

                // TODO decimal handling?
                return reader.GetDouble();

            case JsonTokenType.String:
                return reader.GetString();

            case JsonTokenType.StartArray:
                var list = new List<object?>();
                while (reader.Read() && reader.TokenType != JsonTokenType.EndArray)
                {
                    object? element = ReadObjectCore(ref reader);
                    list.Add(element);
                }
                return list;

            case JsonTokenType.StartObject:
                var dict = new Dictionary<string, object?>();
                while (reader.Read() && reader.TokenType != JsonTokenType.EndObject)
                {
                    Debug.Assert(reader.TokenType is JsonTokenType.PropertyName);
                    string propertyName = reader.GetString()!;

                    if (!reader.Read()) throw new JsonException();
                    object? propertyValue = ReadObjectCore(ref reader);
                    dict[propertyName] = propertyValue;
                }
                return dict;

            default:
                throw new JsonException();
        }
    }
}

The reference implementation is intentionally simplistic and necessarily loses fidelity when it comes to its roundtripping abilities. A few noteworthy examples:

API Proposal

namespace System.Text.Json.Serialization;

public enum JsonUnknownTypeHandling
{
    JsonElement,
    JsonNode,
+    DotNetPrimitives,
}

API Usage

var options = new JsonSerializerOptions { UnknownTypeHandling = JsonUnknownTypeHandling.DotNetPrimitives };

var result = JsonSerializer.Deserialize<object>("""[null, 1, 3.14, true]""", options);
Console.WriteLine(result is List<object>); // True
foreach (object? value in (List<object>)result) Console.WriteLine(value?.GetType()); // null, int, double, bool

Alternative Designs

Do nothing, have users write their own custom converters.

Risks

There is no one way in which such a "natural" converter could be implemented and there also is no way in which the implementation could be extended by users. There is a good risk that users will not be able to use the feature because they require that the converter is able to roundtrip DateOnly or Uri instances, in which case they would still need to write a custom converter from scratch.

cc @stephentoub @bartonjs @tannergooding who might have thoughts on how primitives get roundtripped.

ghost commented 8 months ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation Branching off from the conversation in #29960 and #97801 to consider a potential built-in `object` deserializer that targets .NET primitive values as opposed to targeting the DOM types: `JsonNode` or `JsonElement`. The background is enabling users migrating off of `Json.NET` needing a quick way to support `object` deserialization, provided that the deserialized object is "simple enough". This approach is known to create problems w.r.t. loss of fidelity when roundtripping, which is why it was explicitly ruled out when STJ was initially being designed. It is still something we might want to consider as an opt-in accelerator for users that do depend on that behaviour. This proposal would map JSON to .NET types using the following recursive schema: * JSON null maps to .NET `null`. * JSON booleans map to .NET `bool` values. * JSON numbers map to `int`, `long` or `double`. * JSON arrays map to `List`. * JSON objects map to `Dictionary`. Here's a reference implementation of the above: ```C# public class NaturalObjectConverter : JsonConverter { public override object? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => ReadObjectCore(ref reader); public override void Write(Utf8JsonWriter writer, object value, JsonSerializerOptions options) { Type runtimeType = value.GetType(); if (runtimeType == typeof(object)) { writer.WriteStartObject(); writer.WriteEndObject(); } else { JsonSerializer.Serialize(writer, value, runtimeType, options); } } private static object? ReadObjectCore(ref Utf8JsonReader reader) { switch (reader.TokenType) { case JsonTokenType.Null: return null; case JsonTokenType.False or JsonTokenType.True: return reader.GetBoolean(); case JsonTokenType.Number: if (reader.TryGetInt32(out int intValue)) { return intValue; } if (reader.TryGetInt64(out long longValue)) { return longValue; } // TODO decimal handling? return reader.GetDouble(); case JsonTokenType.String: return reader.GetString(); case JsonTokenType.StartArray: var list = new List(); while (reader.Read() && reader.TokenType != JsonTokenType.EndArray) { object? element = ReadObjectCore(ref reader); list.Add(element); } return list; case JsonTokenType.StartObject: var dict = new Dictionary(); while (reader.Read() && reader.TokenType != JsonTokenType.EndObject) { Debug.Assert(reader.TokenType is JsonTokenType.PropertyName); string propertyName = reader.GetString()!; if (!reader.Read()) throw new JsonException(); object? propertyValue = ReadObjectCore(ref reader); dict[propertyName] = propertyValue; } return dict; default: throw new JsonException(); } } } ``` The reference implementation is intentionally simplistic and necessarily loses fidelity when it comes to its roundtripping abilities. A few noteworthy examples: * Values such as `DateTimeOffset`, `TimeSpan` and `Guid` are not roundtripped, instead users get back the string representation of these values. This is done intentionally for consistency, since such a deserialization scheme _cannot_ support all possible types that serialize to string. * Non-standard numeric representations such as `NaN`, `PositiveInfinity` and `NegativeInfinity` currently serialized as strings using the opt-in `JsonNumberHandling.AllowNamedFloatingPointLiterals` flag are not roundtripped and are instead returned as strings. * Numeric values can lose fidelity (e.g. `decimal.MaxValue` gets fit into a `double` representation). ### API Proposal ```diff namespace System.Text.Json.Serialization; public enum JsonUnknownTypeHandling { JsonElement, JsonNode, + DotNetPrimitives, } ``` ### API Usage ```csharp var options = new JsonSerializerOptions { UnknownTypeHandling = JsonUnknownTypeHandling.DotNetPrimitives }; var result = JsonSerializer.Deserialize("""[null, 1, 3.14, true]""", options); Console.WriteLine(result is List); // True foreach (object? value in (List)result) Console.WriteLine(value?.GetType()); // null, int, double, bool ``` ### Alternative Designs Do nothing, have users write their own custom converters. ### Risks There is no one way in which such a "natural" converter could be implemented and there also is no way in which the implementation could be extended by users. There is a good risk that users will not be able to use the feature because they require that the converter is able to roundtrip `DateOnly` or `Uri` instances, in which case they would still need to write a custom converter from scratch. cc @stephentoub @bartonjs @tannergooding who might have thoughts on how primitives get roundtripped.
Author: eiriktsarpalis
Assignees: -
Labels: `api-suggestion`, `area-System.Text.Json`, `untriaged`
Milestone: -
RenderMichael commented 8 months ago

This would be great. I’ve had to define methods like

static object ToNormalObject(this JsonNode? value)
{
    // basically the same implementation as OP
}

In my migrations, I’ve had types with properties of Dictionary<string, object?> and due to the limitations described in this issue, a lot of the operations I was doing (mostly casting) went from “just working” to InvalidCastExceptions.

Another workaround I did was making the property type JsonObject and change all the uses of this property to use the JSON type. I didn’t like doing this because it exposed the implementation detail of serialization. But it’s the hardiest solution to the problem, and actually exposed some bugs 😄.

eiriktsarpalis commented 8 months ago

I didn’t like doing this because it exposed the implementation detail of serialization. But it’s the hardiest solution to the problem, and actually exposed some bugs 😄.

I think that comment is spot on. This functionality is popular simply because it's what Json.NET was doing but it is fundamentally compromised when it comes to round-tripping capability.

RenderMichael commented 8 months ago

The biggest problem with fidelity I encountered was numbers being non-roundtrippable. In my opinion the best solution would be to subtype JsonValue further for numbers, and add some numeric methods to this JsonNumber type, ideally even some generic math.

As a JSON number, it would have to follow JavaScript rules, but I think it can be done.

That’s a separate issue from deserializing to an object though.

eiriktsarpalis commented 8 months ago

I agree that a JsonNumber type would be useful, however substituting it in the existing JsonNode hierarchy would be very much a breaking change.

RenderMichael commented 8 months ago

substituting it in the existing JsonNode hierarchy would be very much a breaking change.

How so? If JsonNumber is a subtype of JsonValue, then I don't see the problem unless there's some GetType() == typeof(JsonValue) shenanigans somewhere.

eiriktsarpalis commented 8 months ago

I was incorrectly assuming that JsonValue<T> is part of the public API surface, but it seems like it isn't.