dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

JsonSerializer polymorphic serialization and deserialization support #30083

Closed Symbai closed 2 years ago

Symbai commented 5 years ago

EDIT see https://github.com/dotnet/runtime/issues/30083#issuecomment-861524767 for a finalized proposal.

Original proposal by @Symbai (click to view) ``` public class FooBase public class FooA : FooBase public class FooB : FooBase List SerializationObject = new List { new FooA(), new FooB() } ``` Serialized with Newtonsoft.JSON ``` JsonSerializerSettings Settings = new JsonSerializerSettings { SerializationBinder = new KnownTypesBinder { KnownTypes = new List { typeof(FooA), typeof(FooB) } }, TypeNameHandling = TypeNameHandling.Objects }; List FooBases = new List() { new FooA(), new FooB() }; var json = JsonConvert.SerializeObject(FooBases, Settings); ``` will be: `[{"$type":"FooA","NameA":"FooA","Name":"FooBase"},{"$type":"FooB","NameB":"FooB","Name":"FooBase"}]` When using System.Text.Json.JsonSerializer to deserialize the json, FooA and FooB are both type of FooBase. Is there a way that JsonSerializer supports inheirited classes? How can I make sure the type will be the same and not the base class it inherits from?
Symbai commented 5 years ago

Figured it's somehow possible using JsonConverter, which is for some reason completely undocumented https://docs.microsoft.com/en-us/dotnet/api/system.text.json?view=netcore-3.0 but I have no idea how to deserialize without copying each property by hand like it's shown in this example https://github.com/dotnet/corefx/issues/36639. Copying properties by hand seems extremely odd as classes may have lots of them and maintainability isn't given in such a case anymore. Is there a way I can simply deserialize into a specific type in the converter?

Or am I missing something and there is a simple solution like in Newtonsoft.Json ?

public class FooAConverter : System.Text.Json.Serialization.JsonConverter<FooBase>
{
    public override bool CanConvert(Type type)
    {
        return typeof(FooBase).IsAssignableFrom(type);
    }
    public override FooBase Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        reader.Read();
        var typeProperty = reader.GetString();
        if (typeProperty != "$type")
            throw new NotSupportedException();
        reader.Read();
        var typeValue = reader.GetString();
        FooBase value;
        switch (typeValue)
        {
            case "FooA":
                value = new FooA(); //<-- return deserialized FooA?
                break;
            case "FooB":
                value = new FooB(); //<-- return deserialized FooB?
                break;
            default:
                throw new System.Text.Json.JsonException();
        }
        while (reader.Read())
        {
            //How to deserialize without copying every property at this point?
            if (reader.TokenType == JsonTokenType.EndObject)
            {
                return value;
            }
        }
        throw new NotSupportedException();
    }

    public override void Write(Utf8JsonWriter writer, FooBase value, JsonSerializerOptions options)    {    }
}
ahsonkhan commented 5 years ago

cc @steveharter

Timovzl commented 5 years ago

It's pretty awkward that JsonConverter doesn't give you a way to recurse back into the "regular" flow (i.e. "not handled by this particular converter") for deserializing subcomponents.

This seems to make the whole design only useful for very simple cases, with few properties and no nested custom types.

steveharter commented 5 years ago

Figured it's somehow possible using JsonConverter, which is for some reason completely undocumented https://docs.microsoft.com/en-us/dotnet/api/system.text.json?view=netcore-3.0

It's in the sub-namespace Serialization: https://docs.microsoft.com/en-us/dotnet/api/system.text.json.serialization?view=netcore-3.0

steveharter commented 5 years ago

I have no idea how to deserialize without copying each property by hand like it's shown in this example

and

t's pretty awkward that JsonConverter doesn't give you a way to recurse back into the "regular" flow (i.e. "not handled by this particular converter") for deserializing subcomponents

Yes that is known - the existing converters are bare-bones converters that let you do anything you want, but also require you do essentially do everything for a given type (but you can recurse to handle other types). They were primarily intended for data-type converters, not for objects and collections.

We are working on making this better in 5.0 timeframe by providing object- and collection- based specific converters that would make this much easier, as well as improve performance in certain cases.

Timovzl commented 5 years ago

but you can recurse to handle other types

@steveharter That seems like it could help. Can you give an example of how to do this?

ANahr commented 4 years ago

We are using custom converters like the following for inherited class support. I'm pretty sure deserialization is abysmal from a perf point of view, but it works and doesn't need much code and serialization works as it should. So as a Workaround you can use somthing like this:

public class ParameterTransferObjectConverter : JsonConverter<ParameterTransferObject>
{
    public override ParameterTransferObject Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        using (var doc = JsonDocument.ParseValue(ref reader))
        {
            ParamType typeDiscriminator = (ParamType)doc.RootElement.GetProperty(@"parameterType").GetInt32();
            var type = ParameterManager.GetTransferObjectType(typeDiscriminator);
            // Enhance: doc.RootElement.GetRawText() has likely bad perf characteristics
            return (ParameterTransferObject)JsonSerializer.Deserialize(doc.RootElement.GetRawText(), type, options);
        }
    }

    public override void Write(Utf8JsonWriter writer, ParameterTransferObject value, JsonSerializerOptions options)
    {
        var type = ParameterManager.GetTransferObjectType(value.ParameterType);
        JsonSerializer.Serialize(writer, value, type, options);
    }
}
lfr commented 4 years ago

Thanks @ANahr, indeed performance must be abysmal in comparison, but this allowed me to at least get deserialization to work till System.Text.Json matures, in my opinion it's a mistake for Microsoft to encourage people to use it, it should still be flagged as preview

layomia commented 4 years ago

Marking this as the issue to track polymorphic deserialization for 5.0. Serialization is tracked in https://github.com/dotnet/corefx/issues/38650.

layomia commented 4 years ago

From @Milkitic in dotnet/corefx#41305

For some reasons that Json for configuration use, I did a test on how the System.Text.Json handles the interface deserialization. I read the API and can't find anything about this function. First I tried IEnumerable<T>. Here is the code:

using System.Collections.Generic;
using System.Text.Json;

namespace dotnet30test
{
    class Program2
    {
        static void Main(string[] args)
        {
            var obj = new TestSerializationObj2
            {
                Names = new HashSet<string> { "1", "2" }
            };
            var sysContent = JsonSerializer.Serialize(obj, new JsonSerializerOptions { WriteIndented = true, });
            var sysNewObj = JsonSerializer.Deserialize<TestSerializationObj>(sysContent);
        }
    }

    public class TestSerializationObj2
    {
        public IEnumerable<string> Names { get; set; }
    }
}

The Names property is forced to convert from Hashset<string> to List<string>, but it works successfully indeedly. And then I try the custom interface, it throws JsonException. What I think that needed is something like Json.NET's auto type name handling. Here is the code:

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using NewtonJson = Newtonsoft.Json;
using SysJson = System.Text.Json;

namespace dotnet30test
{
    class Program
    {
        static void Main(string[] args)
        {
            var obj = new TestSerializationObj
            {
                CustomObj = new DefaultCustom()
            };

            var newtonContent = NewtonJson.JsonConvert.SerializeObject(obj,
                new NewtonJson.JsonSerializerSettings
                {
                    TypeNameHandling = NewtonJson.TypeNameHandling.Auto,
                    Formatting = NewtonJson.Formatting.Indented
                });
            var newtonNewObj = NewtonJson.JsonConvert.DeserializeObject<TestSerializationObj>(newtonContent,
                new NewtonJson.JsonSerializerSettings { TypeNameHandling = NewtonJson.TypeNameHandling.Auto }); // works

            var sysContent = SysJson.JsonSerializer.Serialize(obj, new SysJson.JsonSerializerOptions { WriteIndented = true, });
            var sysNewObj = SysJson.JsonSerializer.Deserialize<TestSerializationObj>(sysContent); // throws exception
        }
    }

    public sealed class TestSerializationObj
    {
        public Dictionary<string, int> Dictionary { get; set; } = new Dictionary<string, int>() { ["a"] = 0, ["b"] = 1 };
        public ICustom CustomObj { get; set; }
    }

    public class DefaultCustom : ICustom
    {
        public string Name { get; set; } = "Default Implementation";
        public int Id { get; set; } = 300;
    }

    public interface ICustom
    {
        int Id { get; set; }
    }
}

Thanks!

mcatanzariti commented 4 years ago

Please try this library I wrote as an extension to System.Text.Json to offer polymorphism: https://github.com/dahomey-technologies/Dahomey.Json

Symbai commented 4 years ago

Why has this been removed from 5.0 roadmap?! We can currently not deserialize any POCO class, as well as any class which has an interface. This is definitely not a minor issue as both is widely used. Its a blocking issue for anyone currently using Newtonsoft. And its a known issue since about a year, so there was plenty of time. I'm sorry to say but I'm disappointed if this is something we need to wait another whole year when it ships with .NET 6 instead of .NET 5. And the bad taste also comes from the fact that serialization of these classes is supported already. So we have JSON support which only works into one direction only, it does not feel finished and ready to be shipped with .NET 5.

lwardzala commented 4 years ago

I've just discovered this thread and I decided to publish a solution I've been using in my projects. It's an implementation of abstraction converter factory very easy to introduce. It might be helpful till the issue will be closed :) https://github.com/lwardzala/Json.Abstraction

cocowalla commented 4 years ago

I appreciate that the community are trying to backfill this missing functionality as a stopgap measure, but I have to join the chorus of voices here: it's crazy to release a JSON serialiser without polymorphic serialisation/deserialisation support, for an object orientated language, and then push people towards using it instead of Newtonsoft. I get it brings perf/allocation improvements, and that really is very much appreciated, but it also needs to actually be capable of real-world serialisation! :zany_face:

bklooste commented 4 years ago

Just because the language is polymorphic doesn't mean messages should be, json and XML are NOT polymorphic . At best you use a tagged union which some languages may interpret as data for a potentially polymorphic object and should be more in the app domain than the plumbing eg model the simple message and build the object from it. If your real OO where is your SOLID/ separation of concerns ? . Many consumers of messages are not OO and its pretty bad practice IMHO to put this on the wire and i have seen it cause tons of time wasting and issues. Its even better when you need to pass it to some other part of an organization or a partner who likely wont be using the same lib /language.

So my 2c / opinion for what its worth is I consider it bit like immutable strings, stop developers doing potentially bad things you really don't need in most cases. A community / extra lib is about the right level in terms of you can pull it in but you are forced to ask do i really need this or is it for legacy sake / i don't have time / money to change it now.

KISS

vpaulino commented 4 years ago

So If I have an API based on contracts that share common properties I cannot make use of the language features is that it? I will need to replicate code? potencially increasing Technical Debt?

C# is ( until now ) an OOP language, so why are we building libraries that remove from us the usage of the basic language features?

If the language nativelly has feature why are we creating that limitation?

Simple example:


public class ApiResult<T>  {

    public T Payload {get; set;}

}
public class PaginationApiResult<T> : ApiResult<IEnumerable<T>>
{
   public int Take {get; set; }
   public int Skip {get; set; }
  public long TotalRecords {get; set;}
}

Some other examples like Having a contract on my Frontend Api where I agregate data from multiple backends ( each with their own responsabilty) I want to have the capacity to minimize the number of requests to my Frontend API so reusing models helps me where I return a General Description of my Api Model and another contract where I return some detailed description of that model:


public class PersonGeneralInfo {

public string CitizenId {get; set;}
public string FirstName {get; set;}
public string LastName {get; set;}
public DateTime BirthDate {get; set;}

}

public class EmployedPersonInformation : PersonGeneralInfo {

public DateTime StartedAt {get; set;}
public string CompanyName {get; set;}

}

public class PersonHealthInfo : PersonGeneralInfo {

public double Height {get; set; }
public double Weigth {get; set; }
public double BloodPresure {get; set; }

}

public class PersonLocationsInfo : PersonGeneralInfo {

public Address Home {get; set; }
public Address Work {get; set; }
public IEnumerable<Address> FavoritesRestaurants {get; set; }

}  

Many of these consumers are generated from swagger to typescript or C# applications or even Java Applications.

Andrzej-W commented 4 years ago

@bklooste you are only partially right. There are a lot of projects where people intentionally use C# on the server and on the client (for example Blazor) and these projects don't have to interact with components written in other languages. If it is necessary programmers can design non polymorphic interfaces but if it is not necessary we shouldn't have artificial limitations.

Every real business application uses decimal data type for monetary calculations. This data type is not supported in JavaScript and plenty of other languages. Does that mean we shouldn't use it in our programs? No, we need this data type and we don't care that some decimal numbers cannot be deserialized properly in JavaScript.

We use C# because it is an object oriented language and we also want to use polymorphism in JSON serialization/deserialization.

Symbai commented 4 years ago

Honestly, I'm getting tired of this discussion about the language and polymorphic. I totally got the point of "OOP is C# only" and "In the internet JSON isn't meant to support polymorphic". But here is the fact: If this JsonSerializer is meant to be a SERVER/INTERNET thing where OOP is out of scope.. then it should have been moved into System.Web, same as the existing JavascriptSerializer. Then nobody here would complain about missing polymorphic support. Because its god damn clear that this is a SERVER/INTERNET thing only.

But... it isn't in System.Web, so people complain about missing polymorphic support and they complain for a good reason. Even the old XmlSerializer supports polymorphic without too much additional code.

So please stop and close this tab if you aren't interested in polymorphic support and simply accept the fact that many developers are and the way Microsoft has announced and design this JsonSerializer (Its not in System.Web and its many times compared against Newtonsoft.JSON which DOES support polymorphic) gives us the right to complain and ask for this missing feature.

Thanks for your attention and have a nice day, no matter if you agree with me or not.

vpaulino commented 4 years ago

Im not even talking about http ... durable functions for example use newtonsoft to store state between activities i think... probably due to some of the current limitations such as this one. I 100% agree that serialization should be considered to be used not only on the comunication world, but also to store state for example, ( REST is transfer state and databases is to store state) and for that use case i would need to have a persistence model without OOP polymorphic feature? I think without Polymorphic support on Text.Json we will be reducing the usecases instead of being a more generic json serialization available to any use case.

taspeotis commented 3 years ago

I am publishing and consuming Kafka messages with their bodies in JSON format and I feel like the security concerns that blocked System.Text.Json from implementing polymorphic deserialization don't apply here.

I'm always happy to be educated about security but the way I see it is that the interactions between the Kafka cluster and my applications are closed to outside actors and the JSON is always going to be trusted.

So my preference would be for System.Text.Json to support this, and have it off by default to prevent people from shooting themselves in the foot if they do have untrusted input. But for trusted input scenarios I'm allowed to turn it on and use it.

weitzhandler commented 3 years ago

Here's my attempt:

public class TypeMappingConverter<TType, TImplementation> : JsonConverter<TType>
  where TImplementation : TType
{
  [return: MaybeNull]
  public override TType Read(
    ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
      JsonSerializer.Deserialize<TImplementation>(ref reader, options);

  public override void Write(
    Utf8JsonWriter writer, TType value, JsonSerializerOptions options) =>
      JsonSerializer.Serialize(writer, (TImplementation)value!, options);
}

Usage:

var options =
   new JsonSerializerOptions 
   {
     Converters = 
     {
       new TypeMappingConverter<BaseType, ImplementationType>() 
     }
   };

JsonSerializer.Deserialize<Wrapper>(value, options);

Tests:

[Fact]
public void Should_serialize_references()
{
  // arrange
  var inputEntity = new Entity
  {
    References =
            {
                new Reference
                {
                    MyProperty = "abcd"
                },
                new Reference
                {
                    MyProperty = "abcd"
                }
            }
  };

  var options = new JsonSerializerOptions
  {
    WriteIndented = true,
    Converters =
    {
        new TypeMappingConverter<IReference, Reference>()
    }
  };

      var expectedOutput =
@"{
  ""References"": [
    {
      ""MyProperty"": ""abcd""
    },
    {
      ""MyProperty"": ""abcd""
    }
  ]
}";

  // act
  var actualOutput = JsonSerializer.Serialize(inputEntity, options);

  // assert
  Assert.Equal(expectedOutput, actualOutput);
}

[Fact]
public void Should_deserialize_references()
{
  // arrange

  var inputJson =
@"{
  ""References"": [
    {
      ""MyProperty"": ""abcd""
    },
    {
      ""MyProperty"": ""abcd""
    }
  ]
}";

  var expectedOutput = new Entity
  {
    References =
    {
      new Reference
      {
        MyProperty = "abcd"
      },
      new Reference
      {
        MyProperty = "abcd"
      }
    }
  };

  var options = new JsonSerializerOptions
  {
    WriteIndented = true
  };

  options.Converters.AddTypeMapping<IReference, Reference>();

  // act
  var actualOutput = JsonSerializer.Deserialize<Entity>(inputJson, options);

  // assert
  actualOutput
      .Should()
      .BeEquivalentTo(expectedOutput);
}

public class Entity
{
  ICollection<IReference>? _References;
  public ICollection<IReference> References
  {
    get => _References ??= new HashSet<IReference>();
    set => _References = value;
  }
}

public interface IReference
{
  public string? MyProperty { get; set; }
}

public class Reference : IReference
{
  public string? MyProperty { get; set; }
}
vpaulino commented 3 years ago

I have implemented something about that available here on one of my github repos. Im writing something about that on my blog ( I do not write that much but the important stuff I think is in that repo )

SCLDGit commented 3 years ago

Something that has yet to have been brought up is people with no web expectations at all who are simply using the JSON format as the excellent human-readable, compact interchange format that it is for use cases totally outside of web traffic. Limiting our ability to handle polymorphic data, like that from classes inherited from interfaces, is silly and shortsighted.

robkeimig commented 3 years ago

This is the only thing stopping us from moving to System.Text.Json. We really want to, but we have to have polymorphic support both ways. It might not be "as intended" or "best practice", but we use JSON for serializing most of our business state to/from durable storage. When you are working with incredibly dense object graphs that can change on a daily basis (over 1000 properties across 100+ types), you don't have the luxury of mapping one-property-per-column to your persistence layers and rolling migrators each time. At least not at our scale. JSON serialization of our business models has proven to be, by orders of magnitude, the most sustainable way to manage a huge aspect of our codebase. As a result, we prefer to have full control over serialization at all levels. We have inheritance hierarchies 4-5 types deep that are serializing like a dream w/ Newtonsoft's TypeNameHandling.Auto. No special attributes, helper properties, interfaces, or converter nonsense required.

Due to the level of integration we have with Newtonsoft and the terabytes of existing JSON state we have on customer servers, we will probably have to roll something in-house that is 1:1 compatible. We will probably never be able to use this library. Despite this, I like to bring my case to bear because I am sure there are others who use JSON as a storage format for complex data sets, and of these there are probably some who would love to be able to use polymorphism for better modeling in code.

Certainly for web-based usage, a polymorphic deserialization proposal is suspect at best. I would say even for an internal application, allowing the client to drive type selection is not a good idea. We do not want to use any form of polymorphic serialization on our AspNetCore hosts. Only for private or other trusted back-end code that reads/writes business entities to/from various byte streams.

IngemarNilsson commented 3 years ago

I've just discovered this thread and I decided to publish a solution I've been using in my projects. It's an implementation of abstraction converter factory very easy to introduce. It might be helpful till the issue will be closed :) https://github.com/lwardzala/Json.Abstraction

Thanks, this one did it for me!

GrabYourPitchforks commented 3 years ago

[security hat on]

Friendly reminder - unrestricted polymorphic deserialization is a remote code execution vector. Using Newtonsoft.Json with TypeNameHandling set to any value other than TypeNameHandling.None is insecure unless you have also configured a binder. We hold ourselves to that same bar: Microsoft first-party code which uses Newtonsoft.Json as a serializer is forbidden from touching the TypeNameHandling property. The exception process is intentionally onerous, even with a binder in place.

The safe way to support this is to have a fixed mapping of allowed types. In order to be secure, this mapping must be provided before deserialization takes place. For example:

{
    "$type": "Dog",
    "Name": "Fido",
    "Age": 8,
    "Breed": "Terrier"
}

You can imagine the deserializer having access to a Dictionary<string, Type>, where the value of the $type property is used as a lookup key into this dictionary, and the Type value is used as the actual type to be instantiated and to have its members populated.

// assume the allowed mapping is specified via the 'options' parameter below

Options options = new Options();
options.AllowedMappings["Dog"] = typeof(Contoso.Models.Dog);

Animal animal = Deserialize<Animal>(payload, options);
Console.WriteLine(animal.GetType()); // prints 'Contoso.Models.Dog'

Importantly:

These guidelines will drive any implementation of a polymorphism-enabled deserializer provided by the library. This is required by the security review process Microsoft follows during development.

[security hat off - please have a pleasant day]

lwardzala commented 3 years ago

[security hat on]

Friendly reminder - unrestricted polymorphic deserialization is a remote code execution vector. Using Newtonsoft.Json with TypeNameHandling set to any value other than TypeNameHandling.None is insecure unless you have also configured a binder. We hold ourselves to that same bar: Microsoft first-party code which uses Newtonsoft.Json as a serializer is forbidden from touching the TypeNameHandling property. The exception process is intentionally onerous, even with a binder in place.

The safe way to support this is to have a fixed mapping of allowed types. In order to be secure, this mapping must be provided before deserialization takes place. For example:

{
    "$type": "Dog",
    "Name": "Fido",
    "Age": 8,
    "Breed": "Terrier"
}

You can imagine the deserializer having access to a Dictionary<string, Type>, where the value of the $type property is used as a lookup key into this dictionary, and the Type value is used as the actual type to be instantiated and to have its members populated.

// assume the allowed mapping is specified via the 'options' parameter below

Options options = new Options();
options.AllowedMappings["Dog"] = typeof(Contoso.Models.Dog);

Animal animal = Deserialize<Animal>(payload, options);
Console.WriteLine(animal.GetType()); // prints 'Contoso.Models.Dog'

Importantly:

  • Type.GetType(string) and similar APIs must never under any circumstance be called with untrusted input. This implies that feeding the value of the $type field into Type.GetType is disallowed.
  • The caller must pre-populate the mapping dictionary before the call to deserialize, or there must be some other deterministic and documented way for the deserializer to discover the set of allowed types based solely on statically available type information from the caller. (This can also be provided by a source analyzer or other AOT mechanism.)
  • The mapping dictionary should avoid being clever in attempting to support things like open generics or collections (e.g., List<>, T[]). This may allow unintended recursive nesting, which can re-introduce security holes into the application.

These guidelines will drive any implementation of a polymorphism-enabled deserializer provided by the library. This is required by the security review process Microsoft follows during development.

[security hat off - please have a pleasant day]

Thanks for the guideline! I'll try to review and adapt that in my solution to make it easier to implement and safer as well.

BreyerW commented 3 years ago

@GrabYourPitchforks would be ok to provide polymorphism support via something like TypeResolver that mimics ReferenceResolver but for $type metadata? This way different polymorphism mechanism could be provided depending on security requirements (eg not every app holds sensitive data or some live in isolated enviroments meaning data is always trusted)

jeremyVignelles commented 3 years ago

I think that @Timovzl is really spot on here : https://github.com/dotnet/runtime/issues/30083#issuecomment-534078476

Unless I'm missing something, there is a big prerequisite that is currently missing: Utf8JsonReader is a forward-only reader, and the $type or whatever discrimination property must be the first to be read, otherwise you'll be missing some properties that you can't catch later, unless you buffered them. There is no guarantee that it will be the case for every use case, especially when consuming APIs, like OpenAPI web services.

It's commonn in other languages like typescript to have type unions (like A | B) instead of proper inheritance, and the OpenAPI specification itself offers various mechanisms to deal with composition and inheritance. There's no strong rule however that the discriminator is always first, and we should assume that it might not be the case, and fixing the input json is likely not an option.

What I mean is : Yes, $type deserialization should be possible for compatibility with Newtonsoft APIs or other things, but it is a use case inside a bigger feature, and I'm not even sure it should be supported out of the box. If the tools are there, one could, with the help of samples, write their own converters for the case/create a library that could be reused by everyone.

I've been diving in the A | B issue which is cructial for proper OpenAPI spec support. I created a TypeUnion<A, B> type with a JsonConverter that would try to deserialize as either A or B. Obviously, I faced the same issue as some here, and had to deserialize/reserialize as others have pointed here.

You can follow my work in progress in the Ooak repository, and especially this piece of code in the converter.

So now that the issue is described what options would we have to solve it?

  1. Make the Utf8Reader "rewindable" or save/restore its internal state somehow, like a snapshot.
  2. Create a new Utf8Reader each time we want to "try" a branch.
  3. Any other idea?

Make the Utf8Reader "rewindable" or save/restore its internal state somehow, like a snapshot.

This doesn't sound like a good idea to mess with internal state, but that might be a straightforward approach. I'm sure there are good reasons not to do it that way...

Create a new Utf8Reader each time we want to "try" a branch.

There is already a JsonReaderState available, that represents the internal state of the reader, but that requires the input ReadOnlySpan<byte> which we don't have access to, if I'm not missing anything. Maybe if we had access to that input, we could create another reader with the current state and data from the current position each time we need to "branch" the code.

I don't see any hard blockers since the source is ReadOnly, and it doesn't affect the current reader, but it's harder to implement a serializer since we must advance the original reader at the end.

EDIT: Seems like there are already internal properties named OriginalSequence and OriginalSpan

Closing

I'm pretty sure I don't have the full picture, and specifically, I'm not sure about how this plays with "chunked" json (if the json data comes in multiple ReadOnlySpan<byte>, which is possible with pipes...

GrabYourPitchforks commented 3 years ago

Make the Utf8Reader "rewindable" or save/restore its internal state somehow, like a snapshot.

This is already possible today. It's a struct with all relevant state data present as instance fields, so you can copy the struct by value. Example:

Utf8JsonReader reader = GetReader();

/* do some work (A) with 'reader' */

Utf8JsonReader snapshot = reader; // assign a copy to a local

/* do more work (B) with 'reader' */

reader = snapshot; // rewinds the reader back to what it was when the snapshot was taken, before (B) was performed
jeremyVignelles commented 3 years ago

Thanks for your answer @GrabYourPitchforks . I tried the solution of breaking the internal to get the OriginalSpan to see if there's something we could do in the converter. Unfortunately, it seems that the JsonConverter receives a reader that has already consumed the first token to have its Value fields initialized. This means I can't use JsonSerializer.Deserialize on my new reader initialized with the state as it's already past the start of the object. Am I missing something?

jeremyVignelles commented 3 years ago

So based on @GrabYourPitchforks , I modified my converter and it seems pretty neat to just copy the reader. See this code for example: https://github.com/jeremyVignelles/ooak/blob/3cb644e002f8f58fd91b1144d99c9b58194594f2/Ooak/SystemTextJson/OoakSystemTextJsonConverter.cs#L86

Based on that attempt, I quickly hacked a custom converter to attempt a polymorphic deserialization, and it turns out it works pretty easily. I hosted the code here

Here is the converter:

using System;
using System.Collections.Generic;
using System.Text.Json;
using System.Text.Json.Serialization;

namespace SystemTextJsonPolymorphicDeserialization
{
    public class PolymorphicConverter : JsonConverter<BaseModel>
    {
        public readonly Dictionary<string, Type> TypeMapping = new Dictionary<string, Type> {
            { "Bar", typeof(BarModel) },
            { "Foo", typeof(FooModel) }
        };

        public override BaseModel? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        {
            if(reader.TokenType != JsonTokenType.StartObject)
            {
                throw new JsonException("Expected start of object");
            }
            var typeReader = reader;
            Type? deserializedType = null;
            while(typeReader.Read())
            {
                if(typeReader.TokenType == JsonTokenType.PropertyName && typeReader.GetString() == "Type")
                {
                    if(!typeReader.Read())
                    {
                        throw new JsonException($"Unable to read.");
                    }
                    var typeName = typeReader.GetString();
                    if (typeName is null)
                    {
                        throw new JsonException($"Unable to read the type name.");
                    }
                    if(!this.TypeMapping.TryGetValue(typeName, out deserializedType))
                    {
                        throw new JsonException($"Invalid type in json : {typeName}");
                    }
                    break;
                }
                else if(typeReader.TokenType == JsonTokenType.StartArray || typeReader.TokenType == JsonTokenType.StartObject)
                {
                    typeReader.Skip();
                }
            }

            if(deserializedType == null)
            {
                throw new JsonException($"Key 'Type' not found.");
            }

            return (BaseModel?)JsonSerializer.Deserialize(ref reader, deserializedType, options);
        }

        public override void Write(Utf8JsonWriter writer, BaseModel value, JsonSerializerOptions options)
        {
            throw new NotImplementedException();
        }
    }
}

I really like the way I could branch the reader to detect a given propertyName Type in my example, but might as well be $type or whatever. It's way better than parsing twice in a JsonDocument or other hacks I've seen.

But I'm under the impression that I'm missing something really simple. If it's that simple, how come I'm the first one to discover this? Isn't there any caveat I'm ignoring here, like chunked json (ReadOnlySegments...) or something ?

If it really works, it could be pretty easy to code an API around that to register known subclasses of a given type, but defining the API shape might be tricky

ahsonkhan commented 3 years ago

It's way better than parsing twice in a JsonDocument or other hacks I've seen.

:)

But I'm under the impression that I'm missing something really simple. If it's that simple, how come I'm the first one to discover this?

Others have posed similar questions. It is understandable that some folks miss that Utf8JsonReader is a struct with value semantics and not a class, but as you observed, it makes resetting or taking a snapshot quite easy. The serializer and JsonDocument internally use this approach sometimes too: https://stackoverflow.com/a/59744873/12509023 image https://github.com/dotnet/runtime/blob/6dd7b2df5098a61514b13114cb5bd682dff5c71c/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs#L353-L354

Isn't there any caveat I'm ignoring here, like chunked json (ReadOnlySegments...) or something ?

No, I don't think so. You don't seem to be doing anything with the reader that would break if the input was chunked as a ReadOnlySequence. In most cases, as long as you are using standard public methods on the Utf8JsonReader, whether the input was a contiguous ROSpan or split in chunks as a ROSequence shouldn't be a concern you need to worry about. The reader deals with that for you since almost all the APIs on it are agnostic to that detail. You could try adding some test cases to verify for yourself. Outside of the context of the serializer (i.e. when directly using the reader), you would have to be careful about one other fact, which is processing partial or incomplete JSON data, since the reader is re-entrant and supports incomplete input. However, in the context of JsonSerializer and within the JsonConverter you write, that is not a concern because the serializer reads ahead to ensure the JSON you are deserializing contains at least enough data to complete the current object successfully, before calling your JsonConverter. So, the reader.Read() method within a JsonConverter will not return false. https://github.com/dotnet/runtime/blob/6dd7b2df5098a61514b13114cb5bd682dff5c71c/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonConverter.ReadAhead.cs#L43-L48

https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-converters-how-to?pivots=dotnet-5-0#steps-to-follow-the-basic-pattern from https://github.com/dotnet/docs/issues/19654

Override the Read method to deserialize the incoming JSON and convert it to type T. Use the Utf8JsonReader that is passed to the method to read the JSON. You don't have to worry about handling partial data, as the serializer passes all the data for the current JSON scope. So it isn't necessary to call Skip or TrySkip or to validate that Read returns true.

jeremyVignelles commented 3 years ago

However, in the context of JsonSerializer and within the JsonConverter you write, that is not a concern because the serializer reads ahead to ensure the JSON you are deserializing contains at least enough data to complete the current object successfully, before calling your JsonConverter. So, the reader.Read() method within a JsonConverter will not return false.

This is what I needed, thanks for the explanation. That makes sense, because I never saw any converter example dealing with incomplete data.

I didn't know there was an example for polymorphic deserialization, but as pointed in your stackoverflow thread, the discriminator is outside the data being read, which is not the case for the objects serialized with Newtonsoft for example, so the converter example I posted above could be really useful.

bklooste commented 3 years ago

@bklooste you are only partially right. There are a lot of projects where people intentionally use C# on the server and on the client (for example Blazor) and these projects don't have to interact with components written in other languages. If it is necessary programmers can design non polymorphic interfaces but if it is not necessary we shouldn't have artificial limitations.

I have lost count of how often i have heard this argument .. im ancient now and every time management or some new dev team changes the back end to Java / go or front end to js this assumption has ended up costing a mint . Its hard language based coupling. ( Why are you using an interface again... OO is dying it doesnt need to die but dont distribute it)

Not to mention the expensive C# cost when the deserializer breaks because some rarely used thing breaks and these are EXPENSIVE, painful to find and fix bugs often needing contracts changed across many services.. What do you gain you save a handful of lines in a SIMPLE converter at a cost of hidden complexity and expensive bugs ? Note it is NOT datetime / decimal conversion that is the issue all serializes do that which is fine ( as it does not change on wire representation) its only hacking polymorphism into json when its not designed for it ( and XML was the same) and it is a hack. Typeof has also been used, its better but again its not worth it leading to bugs because it gets omitted and create cross project dependency issues. The problem were trying to solve convert and convert back your self is NOT a big problem so why do we need magic in libs.

You still occasionally need polymorphism sometimes on the wire but do it like every other modern language eg a dev managed tagged union.

I admit there are a few corner cases internally within a small app but on the wire/ in messages for larger things its bad practice that said there is a strong argument a general lib should hence support it. However this is my big concern as i have seen so often 9 in 10 will use it when they should not ie they think its convenient but they create a support trap / something that can be hard to change - especially enterprise messages . Its a bit like mutable char[] strings its too easy to make multi threading mistakes so you make them immutable even though there are cases where its more performant.

Anyway each to his own but i thought i would share my experience.

eiriktsarpalis commented 3 years ago

Propoposed Design (polymorphism using type discriminators)

This feature allows users to opt in to polymorphic serialization for a given type by associating string identifiers with particular subtypes in the hierarchy. These identifiers are written to the wire so this brand of polymorphism is roundtrippable.

At the core of the design is the introduction of JsonKnownType attribute that can be applied to type hierarchies like so:

[JsonKnownType(typeof(Derived1), "derived1")]
[JsonKnownType(typeof(Derived2), "derived2")]
public class Base
{
    public int X { get; set; }
}

public class Derived1 : Base
{
    public int Y { get; set; }
}

public class Derived2 : Base
{
    public int Z { get; set; }
}

This allows roundtrippable polymorphic serialization using the following schema:

var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1", "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2", "X" : 0, "Z" : 0 }

JsonSerializer.Deserialize<Base>(json1); // uses Derived1 as runtime type
JsonSerializer.Deserialize<Base>(json2); // uses Derived2 as runtime type

Alternatively, users can specify known type configuration using the JsonSerializerOptions.TypeDiscriminatorConfigurations property:

public class JsonSerializerOptions
{
    public IList<TypeDiscriminatorConfiguration> TypeDiscriminatorConfigurations { get; }
}

which can be used as follows:

var options = new JsonSerializerOptions
{
    TypeDiscriminatorConfigurations =
    {
        new TypeDiscriminatorConfiguration<Base>()
          .WithKnownType<Derived1>("derived1")
          .WithKnownType<Derived2>("derived2")
    }
};

or alternatively

var options = new JsonSerializerOptions
{
    TypeDiscriminatorConfigurations =
    {
        new TypeDiscriminatorConfiguration(typeof(Base))
          .WithKnownType(typeof(Derived1), "derived1")
          .WithKnownType(typeof(Derived2), "derived2")
    }
};

Proposed API

namespace System.Text.Json
{
+    [AttributeUsageAttribute(AttributeTargets.Class | AttributeTargets.Interface, AllowMultiple = true, Inherited = false)]
+    public partial class JsonKnownTypeAttribute : JsonAttribute
+    {
+        public JsonKnownTypeAttribute(Type subtype, string identifier) { }
+        public string Identifier { get { throw null; } }
+        public System.Type Subtype { get { throw null; } }
+    }

+    public class TypeDiscriminatorConfiguration : IReadOnlyCollection<KeyValuePair<System, string>>
+    {
+        public TypeDiscriminatorConfiguration(Type baseType);
+        public System BaseType { get; }
+        public TypeDiscriminatorConfiguration WithKnownType(Type derivedType, string identifier) { }
+    }

+    public class TypeDiscriminatorConfiguration<TBaseType> : TypeDiscriminatorConfiguration 
+        where TBaseType : class
+    {
+        public TypeDiscriminatorConfiguration();
+        public TypeDiscriminatorConfiguration<TBaseType> WithKnownType<TDerivedType>(string identifier) 
+            where TDerivedType : TBaseType { }
+    }

    public class JsonSerializerOptions
    {
+        public IList<TypeDiscriminatorConfiguration> TypeDiscriminatorConfigurations { get; }
    }
}

Open Questions

Customizing the type discriminator property name

The property name is currently hardcoded to $type but it stands to reason that users might need to customize this, potentially using a special attribute:

[JsonTypeDiscriminatorPropertyName("_case")]
[JsonKnownType(typeof(MyDerivedType), "derived")]
public class MyPoco { }
public class MyDerivedtype : MyPoco { }

JsonSerializer.Serialize(new MyDerivedType()); // { "_case" : "derived" }

Using type discriminators that are not strings

We might want to allow specifying non-string type identifiers:

[JsonKnownType(typeof(MyDerivedtype), 0)]
[JsonKnownType(typeof(MyOtherDerivedtype), 1)]
public class MyPoco { }
public class MyDerivedtype : MyPoco { }
public class MyOtherDerivedtype : MyPoco { }

JsonSerializer.Serialize(new MyDerivedType()); // { "_case" : 0 }

The only valid identifiers would be strings and integers and it should not be possible for a given type hierarchy to mix and match identifier types.

Semantics

The type discriminator semantics could be implemented following two possible alternatives, which for the purposes of this document I will be calling "strict mode" and "lax mode". Each approach comes with its own sets of trade-offs.

Strict mode

"Strict mode" requires that any runtime type used during serialization must explicitly specify a type discriminator. For example:

[JsonKnownType(typeof(Derived1),"derived1")]
[JsonKnownType(typeof(Derived2),"derived2")]
public class Base { }

public class Derived1 : Base { }
public class Derived2 : Base { }
public class Derived3 : Base { }

public class OtherDerived1 : Derived1 { }

JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1" }
JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2" }
JsonSerializer.Serialize<Base>(new Derived3()); // throws NotSupportedException
JsonSerializer.Serialize<Base>(new OtherDerived1()); // throws NotSupportedException
JsonSerializer.Serialize<Base>(new Base()); // throws NotSupportedException

Any runtime type that is not associated with a type discriminator will be rejected, including instances of the base type itself. This approach has a few drawbacks:

Lax mode

"Lax mode" as the name suggests is more permissive, and runtime types without discriminators are serialized using the nearest type ancestor that does specify a discriminator. Using the previous example:

[JsonKnownType(typeof(Derived1),"derived1")]
[JsonKnownType(typeof(Derived2),"derived2")]
public class Base { }

public class Derived1 : Base { }
public class Derived2 : Base { }
public class Derived3 : Base { }

public class OtherDerived1 : Derived1 { }

JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1" }
JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2" }
JsonSerializer.Serialize<Base>(new Derived3()); // { } serialized as `Base`
JsonSerializer.Serialize<Base>(new OtherDerived1()); // { "$type" : "derived1" } inherits schema from `Derived1`
JsonSerializer.Serialize<Base>(new Base()); // { } serialized as `Base`

This approach is more flexible and supports interface and abstract type hierarchies:

[JsonKnownType(typeof(Foo), "foo")]
[JsonKnownType(typeof(IBar), "bar")]
public interface IFoo { }
public abstract class Foo : IFoo { }
public interface IBar : IFoo { }

public class FooImpl : Foo {}

JsonSerializer.Serialize<IFoo>(new FooImpl()); // { "$type" : "foo" }

However it does come with its own set of problems:

[JsonKnownType(typeof(Foo), "foo")]
[JsonKnownType(typeof(IBar), "bar")]
public interface IFoo { }
public class Foo : IFoo { }
public interface IBar : IFoo { }

public Baz : Foo, IBar { }

JsonSerializer.Serialize<IFoo>(new Baz()); // diamond ambiguity, could either be "foo" or "bar",
                                           // throws NotSupportedException.

Polymorphism Draft Design Document Prototype Implementation

eiriktsarpalis commented 3 years ago

We have decided to move this feature to .NET 7, since implementing it requires substantial infrastructural refactorings in the core serialization logic. Since we are fairly late in the .NET 6 development cycle, introducing the changes at this point carries risk of regressions.

BreyerW commented 3 years ago

@eiriktsarpalis

I would like to ask if mirroring ReferenceResolver API for polymorphic support with appropiate tweaks was considered? This would allow providing both newtonsoft compatible and secure with type discriminators implementations of polymorphic support and also other flavours. Secure implementation could be provided by default in bcl and newtonsoft provided in guide to migrate but not in bcl to discourage wide use of it

eiriktsarpalis commented 3 years ago

@BreyerW you mean abstracting the polymorphism resolution logic behind an abstract class? What specific scenario do you have in mind that would be unlocked by such a design versus what is being proposed? Would you be able to sketch an API design for the resolver type?

Note that we are intentionally not making this design compatible with (or as capable as) Json.NET polymorphism, out of security considerations related to reading type names from the wire.

lwardzala commented 3 years ago

@eiriktsarpalis So what in case of layered architecture when we have a strict hierarchy like in Sitecore framework or just in some applications? Couldn't the relations be registered as it is in MongoDB driver or as I implemented here (https://github.com/lwardzala/Json.Abstraction)? I'm aware that the performance would be much better with your proposed design but still I would like to avoid two-direction dependency flow.

eiriktsarpalis commented 3 years ago

I'm aware that the performance would be much better with your proposed design but still I would like to avoid two-direction dependency flow.

I wouldn't say it's more of a security consideration rather than a performance one: the feature will not automatically support open hierarchies (since that requires serializing and deserializing the type name from the wire). Individual derived types will have to be opted in manually, akin to the KnownTypes attribute in DataContractSerializer.

If circular dependency between base and derived type is not desirable (or indeed possible, in cases where the hierarchy spans multiple assemblies) you can register an equivalent TypeDiscriminatorConfiguration instance in your JsonSerializerOptions, as described in the proposal.

lwardzala commented 3 years ago

Ah yes, JsonSerializerOptions.TypeDiscriminatorConfigurations looks great. I've just misread that part :)

BreyerW commented 3 years ago

@eiriktsarpalis

i was thinking something like this:

abstract class TypeResolver{

//called when $type is read
public abstract Type ResolveType(string);

//called during write, bool indicates if it should be written at all to support type discriminators. 
//Alternatively null on return could be used to indicate that type metadata should NOT be emitted. Then bool isnt needed. 
public abstract string GetTypeMetadata(Type, out bool);

}

Since TypeResolver isnt supposed to be factory for real objects (i think) smaller abstract class than ReferenceResolver like the one presented should suffice.

As for scenarios unblocked it is more about future proofing. I have only skimmed current draft but current design seems to be a bit rigid here and there, mostly around type discriminators (i may be wrong on that though) which causes me a bit of anxiety.

Eg. what if some obscure security hole is discovered in type discriminator? Not necessarily in general/pattern idea - although that is also possible even if unlikely - but in your specific algorithm? Then we have to wait until you fix hole. With abstract approach it would be fixable on our end if necessary. Also possibly someone may wanted different trade offs in perf (eg cpu vs memory)? Also possible bugs in threaded scenarios...

They are all very abstract and probably not very likely but still it is something im slighty worried about

FiniteReality commented 3 years ago
abstract class TypeResolver{

//called when $type is read
public abstract Type ResolveType(string);

//called during write, bool indicates if it should be written at all to support type discriminators. 
//Alternatively null on return could be used to indicate that type metadata should NOT be emitted. Then bool isnt needed. 
public abstract string GetTypeMetadata(Type, out bool);

}

I think this API is only good surface deep. If anything, I think it'll encourage users to just write:

class MyVeryInsecureTypeResolver : TypeResolver
{
    public override Type ResolveType(string type) => Type.GetType(type);
    public override string GetTypeMetadata(Type type) => type.AssemblyQualifiedName;
}

Which, as we know from BinaryFormatter, is very insecure :sweat_smile:

If this is the approach that gets chosen, it needs to come with an analyzer that screams as hard as it can if people do something like that, or it needs to come with a lot of default implementations which people are encouraged to use.

FiniteReality commented 3 years ago

The only valid identifiers would be strings and integers and it should not be possible for a given type hierarchy to mix and match identifier types.

I still think supporting types inheriting from Enum would be advantageous here. As a workaround, I presume the advice would be to cast to int? Something along the lines of this:

public enum MyJsonTypes
{
    TypeOne,
    TypeTwo
    // etc.
}

[JsonKnownType(typeof(TypeOne), (int)MyJsonTypes.TypeOne)]
[JsonKnownType(typeof(TypeTwo), (int)MyJsonTypes.TypeTwo)]
public class BaseClass { }

public class TypeOne : BaseClass { }
public class TypeTwo : BaseClass { }
BreyerW commented 3 years ago

@FiniteReality Of course. For this reason i suggest creating one concrete implementation, default with type discriminator in BCL and no other (and would be automatically supplied if developer didnt do it explicitly). This should nugde a bit more ppl to type discriminator. As for insecure example i feel like team sometimes is trying too hard to guardrail developers from themselves or - in other words - hold our hands.

It is our responsibility to make working software not theirs (their is to ensure working runtime and provide basic building blocks). At least thats what i think - no offense intended.

Also since TypeResolver is abstract we arent limited to insecure implementations unlike BinaryFormatter - just a small reminder.

eiriktsarpalis commented 3 years ago

I still think supporting types inheriting from Enum would be advantageous here. As a workaround, I presume the advice would be to cast to int?

We could support accepting enums directly, but that would require changing the attribute constructor signature to accept object identifiers:

public partial class JsonKnownTypeAttribute : JsonAttribute
{
     public JsonKnownTypeAttribute(Type subtype, object identifier) { }
     public object Identifier { get { throw null; } }
     public System.Type Subtype { get { throw null; } }
}

but that would enable passing invalid identifier types at compile time such as flots or Type instances. I was thinking we could instead enforce a degree of type safety by exposing two constructors:

public partial class JsonKnownTypeAttribute : JsonAttribute
{
     public JsonKnownTypeAttribute(Type subtype, string identifier) { }
     public JsonKnownTypeAttribute(Type subtype, int identifier) { }
     public object Identifier { get { throw null; } }
     public System.Type Subtype { get { throw null; } }
}
BreyerW commented 3 years ago

@eiriktsarpalis do you think generics in attributes would enable enums in safe manner? Afaik generics in attributes are already implemented just not merged for c# due to bug in net framework but c# team recently admitted they no longer consider it a showstopper because enough time lapsed since net framework stopped being developed further. They probably just need a nudge and this scenario might be that nudge (also polymorphism is scheduled for net 7 now meaning timeframe for generic in attributes is up to c# 11 not just c# 10 which would be rather short timeframe from now)

eiriktsarpalis commented 3 years ago

@BreyerW a generic attribute would not prevent invalid specializations, e.g. new JsonKnownTypeAttribute<Type>(typeof(int), typeof(int)) so it would be largely equivalent to accepting an object identifier.

TylerBrinkley commented 3 years ago

I think @BreyerW was thinking of something like this but I'm not sure it's a great experience even if generic attributes were already available.

public partial class JsonKnownTypeEnumAttribute<TEnum> : JsonKnownTypeAttribute
    where TEnum : struct, Enum
{
     public JsonKnownTypeEnumAttribute(Type subtype, TEnum identifier) { }
}

public partial class JsonKnownTypeAttribute : JsonAttribute
{
     public JsonKnownTypeAttribute(Type subtype, string identifier) { }
     public JsonKnownTypeAttribute(Type subtype, int identifier) { }
     public object Identifier { get { throw null; } }
     public System.Type Subtype { get { throw null; } }
}