dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

Object (de)serialization support with Utf8JsonReader\Writer - entry point and options #28325

Closed steveharter closed 4 years ago

steveharter commented 5 years ago

This is the API proposal and feature set for object (de)serialization. It covers the main API entry point (JsonSerializer), the options (JsonSerializerOptions) and the ability to ignore properties during (de)serialization.

Review process update: due to the size of this issue and since new API issues are being added for new features, the overview and forward-looking information has been moved to https://github.com/dotnet/corefx/blob/master/src/System.Text.Json/docs/SerializerProgrammingModel.md. Future API additions will have their own independent API issue created instead of re-using this issue.

Current Status

The reviewed portion of the API is in corefx master and Preview 4. It consists of the JsonSerializer and the JsonSerializerOptions class and provides no extensibility features.

There is a previous prototype at CoreFxLab in the package System.Text.JsonLab.Serialization. It contains non-reviewed APIs including extensibility features like using attributes to define custom value converters, property name policies (like camel-casing), etc. It is build against .NET Core 3.0 Preview 2.

The last status of the API review is shown below.

API

JsonSerializer

This static class is the main entry point.

Let's start with coding examples before the formal API is provided.

Using a simple POCO class:

    public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public DateTime? BirthDay { get; set; }
    }

To deserialize JSON bytes into a POCO instance:

    ReadOnlySpan<byte> utf8= ...
    Person person = JsonSerializer.Parse<Person>(utf8);

To serialize an object to JSON bytes:

    Person person = ...
    byte[] utf8 = JsonSerializer.ToBytes(person);

The string-based Parse() and ToString() are convenience methods for strings, but slower than using the <byte> flavors because UTF8 must be converted to\from UTF16.

namespace System.Text.Json.Serialization
{
    public static class JsonSerializer
    {
        public static object Parse(ReadOnlySpan<byte> utf8Json, Type returnType, JsonSerializerOptions options = null);
        public static TValue Parse<TValue>(ReadOnlySpan<byte> utf8Json, JsonSerializerOptions options = null);

        public static object Parse(string json, Type returnType, JsonSerializerOptions options = null);
        public static TValue Parse<TValue>(string json, JsonSerializerOptions options = null);

        public static ValueTask<object> ReadAsync(Stream utf8Json, Type returnType, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
        public static ValueTask<TValue> ReadAsync<TValue>(Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));

        public static byte[] ToBytes(object value, Type type, JsonSerializerOptions options = null);
        public static byte[] ToBytes<TValue>(TValue value, JsonSerializerOptions options = null);

        public static string ToString(object value, Type type, JsonSerializerOptions options = null);
        public static string ToString<TValue>(TValue value, JsonSerializerOptions options = null);

        public static Task WriteAsync(object value, Type type, Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
        public static Task WriteAsync<TValue>(TValue value, Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
    }
}

JsonSerializerOptions

This class contains the options that are used during (de)serialization.

If an instance of JsonSerializerOptions is not specified when calling read\write then a default instance is used which is immutable and private. Having a global\static instance is not a viable feature because of unintended side effects when more than one area of code changes the same settings. Having a instance specified per thread\context mechanism is possible, but will only be added pending feedback. It is expected that ASP.NET and other consumers that have non-default settings maintain their own global, thread or stack variable and pass that in on every call. ASP.NET and others may also want to read a .config file at startup in order to initialize the options instance.

An instance of this class and exposed objects will be immutable once (de)serialization has occurred. This allows the instance to be shared globally with the same settings without the worry of side effects. The immutability is also desired with a future code-generation feature. Due to the immutable nature and fine-grain control over options through Attributes, it is expected that the instance is shared across users and applications. We may provide a Clone() method pending feedback.

For performance, when a JsonSerializerOptions instance is used, it should be cached or re-used especially when run-time attributes are added because when that occurs, caches are held by the instance instead of being global.


namespace System.Text.Json.Serialization
{
    /// <summary>
    /// Provides options to be used with <see cref="JsonSerializer"/>.
    /// </summary>
    public class JsonSerializerOptions
    {
        public JsonSerializerOptions();

        /// <summary>
        /// Defines whether an extra comma at the end of a list of JSON values in an object or array
        /// are allowed (and ignored) within the JSON payload being read.
        /// By default, it's set to false, and the reader will throw a <exception cref="JsonReaderException"/> if it encounters a trailing comma.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool AllowTrailingCommas {get; set;}

        /// <summary>
        /// The default buffer size in bytes used when creating temporary buffers.
        /// </summary>
        /// <remarks>The default size is 16K.</remarks>
        /// <exception cref="System.ArgumentException">Thrown when the buffer size is less than 1.</exception>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public int DefaultBufferSize { get; set; }

        /// <summary>
        /// Determines whether null values are ignored during serialization and deserialization.
        /// The default value is false.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool IgnoreNullValues { get; set; }

        /// <summary>
        /// Determines whether readonly properties are ignored during serialization and deserialization.
        /// A property is readonly if it contains a public getter but not a public setter.
        /// The default value is false.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool IgnoreReadOnlyProperties { get; set; }

        /// <summary>
        /// Gets or sets the maximum depth allowed when reading or writing JSON, with the default (i.e. 0) indicating a max depth of 64.
        /// Reading past this depth will throw a <exception cref="JsonReaderException"/>.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public int MaxDepth { get; set; }

        /// <summary>
        /// Defines how the <see cref="Utf8JsonReader"/> should handle comments when reading through the JSON.
        /// By default the reader will throw a <exception cref="JsonReaderException"/> if it encounters a comment.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public JsonCommentHandling ReadCommentHandling { get; set; }

        /// <summary>
        /// Defines whether the <see cref="Utf8JsonWriter"/> should pretty print the JSON which includes:
        /// indenting nested JSON tokens, adding new lines, and adding white space between property names and values.
        /// By default, the JSON is written without any extra white space.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool WriteIndented { get; set; }

        // several more options here for other features not covered here
    }

    /// <summary>
    /// The base class of serialization attributes.
    /// </summary>
    public abstract class JsonAttribute : Attribute
    {
    }

    /// <summary>
    /// Prevents a property from being serialized or deserialized.
    /// </summary>
    [AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
    public sealed class JsonIgnoreAttribute : JsonAttribute
    {
        public JsonIgnoreAttribute();
    }
}
Tornhoof commented 5 years ago

Thank you for describing your API and performance concepts. API

Performance

No objects created on the heap during (de)serialization after warm-up, with the exception of the POCO class itself and related properties during serialization.

This probably won't be true if you want to support things like ReadOnlyCollection, I probably would simply say "related properties and types".

Optimized hashtable lookup for property names using byte[] as the key.

This will be slow, I recommend talking to @rynowak about his work in ASP.NET Core Routing and the IL Matcher (the code is pretty much what you need, technically matching a route and matching a property name is quite similar), this is by far the fastest approach, in both the Routing and all the experiments I did for SpanJson https://github.com/Tornhoof/SpanJson/wiki/Technology-and-Internals#3-comparison-against-integers (which is pretty much identical to the approach in Routing). I expect that you could probably reuse 75%+ of the Routing code (an estimation based on how long it took to port my Expression Tree Code from SpanJson to do the Routing API, which was trivial).

davidfowl commented 5 years ago

Where’s the async API?

ahsonkhan commented 5 years ago

Where’s the async API?

In what scenario would you require async (de)serialize APIs? Can you please provide a sample usage?

davidfowl commented 5 years ago

I'm going to be a bit obtuse here. If we're not making these APIs async then we need to have a long discussion again about ASP.NET Core usage of these APIs.

Input:

https://github.com/aspnet/AspNetCore/blob/master/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonInputFormatter.cs#L318

Output:

https://github.com/aspnet/AspNetCore/blob/ca7c48c520f2baae032e0429f19552abed77e009/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonOutputFormatter.cs#L147-L152

Just to throw out some more examples here's the HttpClient JSON formatter that's used today in most applications:

https://github.com/aspnet/AspNetWebStack/blob/39d3064baf13181f5718ec5d85ba644b47d0704b/src/System.Net.Http.Formatting/Formatting/BaseJsonMediaTypeFormatter.cs#L189

To mitigate sync reading, today we buffer the entire Stream into memory or disk (depending on the size):

https://github.com/aspnet/AspNetCore/blob/ca7c48c520f2baae032e0429f19552abed77e009/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonInputFormatter.cs#L240-L249

Which is unfortunate and we'd really love to avoid that. The plan was to have to get that benefit from the new JSON reader/writer/serializer. We don't have a good solution for output today. You end up doing synchronous IO once you hit a buffer threshold which ends up doing sync over async and leans towards causing thread pool starvation.

See https://github.com/aspnet/AspNetCore/issues/6397 for more details on output. ASP.NET Core is going to disable all sync IO on the request and response by default in 3.0.

Tornhoof commented 5 years ago

For the ASP.NET Core formatter you additionally need to support a bag of json properties, which is either serialized into the parent object or deserialized into the bag for RFC 7807 Error Details. See https://github.com/aspnet/Mvc/pull/8529 for more details, that one uses JSON.NET's https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonExtensionDataAttribute.htm

ahsonkhan commented 5 years ago

cc @KrzysztofCwalina, @jkotas, @glennc, @terrajobst as an FYI (note: this is still WIP)

rynowak commented 5 years ago

My feedback in a rough stream-of-consciousness format.

On JsonConverter, we will need the ability to specify the type dynamically as an argument - in addition to the generic overloads. Without this, it will really cause a lot of folks to use a workaround to call the generic version with MakeGenericMethod. It's OK to box in this case.

JsonConverterSettings seems like it has a mashup of settings that generally global to an operation, and don't pertain to specific object or type (indent size for example). Is it expected that I can specify a global indent size, but then override the indent size for a specific type?

I'm really intrigued by how JsonConverterSettings tries to maintain a loose coupling with the specific types of settings/options we provide support for. This is similar to a number of things we do in MVC and it scales horizontally pretty nicely.

I think it would be really neat to see a recipe guide or some side-by-side examples: "this is how you'd set the json property name with an attribute vs with code".

Another natural thing that users will want from JsonConverterSettings is a copy constructor and a way to merge them.

The JsonClassMaterializer seems like a god-object. This is where you control all object creation and property get/set operations for all objects and all types. If the intention is to use these as a replacement for JSON.Net's JsonConverter then I think it's far off. This seems like something that would be intentionally hard to replace. Maybe that's OK depending on what other things we do.


Async is a higher-level bugaboo (I think that's a SFW replacement for what I was thinking 😆). I still think C# and .NET do async better than any other language out there, but we're still struggling to pull together an end-to-end here for the web that actually avoids synchronous I/O at all levels.

From my perspective, I'm still excited to see all of the work that's here. I think we're solving a very critical problem and these are all steps in the right direction.

steveharter commented 5 years ago

@Tornhoof

I'd like to see non-generic FromJson methods, e.g. for the ASP.NET Core TextFormatter classes, assuming this will not change that much

Yes that makes sense. The API is still being defined, and this issue wasn't meant to be complete at this point.

What about global settings in JsonConverterSettings, i.e. NoNulls, Int/String Enums etc?

Currently those simple settings are in JsonPropertyAttribute which can be "added" at run-time. If too verbose we may just place those right on JsonConverterSettings.

Performance

No objects created on the heap during (de)serialization after warm-up, with the exception of the POCO class itself and related properties during serialization.

This probably won't be true if you want to support things like ReadOnlyCollection, I probably would simply say "related properties and types".

Yes, of course.

Optimized hashtable lookup for property names using byte[] as the key.

This will be slow, I recommend talking to @rynowak about his work in ASP.NET Core Routing and the IL Matcher (the code is pretty much what you need, technically matching a route and matching a property name is quite similar), this is by far the fastest approach, in both the Routing and all the experiments I did for SpanJson https://github.com/Tornhoof/SpanJson/wiki/Technology-and-Internals#3-comparison-against-integers (which is pretty much identical to the approach in Routing).

Thanks. I updated the description and implementation to use an interger-based array which is looking very promising at this point in time.

steveharter commented 5 years ago

@rynowak

On JsonConverter, we will need the ability to specify the type dynamically as an argument - in addition to the generic overloads. Without this, it will really cause a lot of folks to use a workaround to call the generic version with MakeGenericMethod. It's OK to box in this case.

Yes that will be added. Thanks

JsonConverterSettings seems like it has a mashup of settings that generally global to an operation, and don't pertain to specific object or type (indent size for example). Is it expected that I can specify a global indent size, but then override the indent size for a specific type

As it is now, JsonConverterSettings is fairly clean and only contains settings that would need to be specified at run-time like MaxDepth and DefaultBufferSize (I'll update the API doc soon with new members). The mashup is JsonPropertyAttribute for now.

I'm really intrigued by how JsonConverterSettings tries to maintain a loose coupling with the specific types of settings/options we provide support for. This is similar to a number of things we do in MVC and it scales horizontally pretty nicely.

Yes there is some loose coupling with the extension points like the property casing where you can provide the implementation. Other simple settings will not really be extensible in that way but the values can be specified via attributes at the assembly, class and property level plus run-time overrides to each of these -- i.e. the values for these simple settings can be changed in many flexible ways, but not necessarily the code that uses them.

I think it would be really neat to see a recipe guide or some side-by-side examples: "this is how you'd set the json property name with an attribute vs with code".

Yes I will do this in the next update.

The JsonClassMaterializer seems like a god-object. This is where you control all object creation and property get/set operations for all objects and all types. If the intention is to use these as a replacement for JSON.Net's JsonConverter then I think it's far off. This seems like something that would be intentionally hard to replace. Maybe that's OK depending on what other things we do.

JsonClassMaterializer is just an enum. The implementation is internal. We may not even need the enum if we decide the default is ok. Currently the default will attempt to use the most efficient mechanism (currently IL gen for get,set,ctor) but if IL gen is not supported then use standard reflection. However, the enum is there currently for someone to change the default if for example they have a single object to deserialize and don't want any additional memory overhead (and initial perf hit) of IL so they want to use standard reflection instead.

steveharter commented 5 years ago

@davidfowl

I'm going to be a bit obtuse here. If we're not making these APIs async then we need to have a long discussion again about ASP.NET Core usage of these APIs.

I added the async apis along with a brief description. Currently they only support a Stream, more if necessary. I have a prototype that verifies the feasibility.

KrzysztofCwalina commented 5 years ago

Some quick feedback:

  1. As @davidfowl have said, we need async APIs that can work with Stream and Pipelines
  2. I don't think the serializer/converter should have a public dependency on JsonReader. The reader should be an implementation detail. The reasons are: a) the reader is a very tricky type that most our users should not be concerned with, b) we should preserve the ability change the implementation to other than the reader.
  3. The deserailizer/converter should be in a separate namespace from the low level reader/writer types. I don't think we should expose the average .NET developer to the complexity of the readers (e.g. to by-ref types).
  4. We should decide if cancellation tokes are optional or not.
  5. I would love to have the ability to generate deserialization code at build time and include it in my projects. In Azure SDK we almost always know the shape of the deserialized schema and so emiting the deserialization code at design time would:
    • make it more efficient
    • be more AOT friendly
    • allow the deserialized types to appear immutable (as the mutators could be internal visibility).
Tornhoof commented 5 years ago

@steveharter If you care about integer tricks, you can optimize the json member writing for utf8 with the same trick. Assuming you will bake precalculated byte-arrays for the property names into the IL, you can use the same concept as for deserialization. Instead of copying the precalculated buffer to the output, just write appropriate integer values to the output. Instead of writing Encoding.UTF8.GetBytes("\"message\":"), you could also write

jsonWriter.WriteUtf8Verbatim(7306916068917079330UL /*"message*/, 14882 /*":*/);

Atleast in my SpanJson tests that was the last main difference in serialization speed between UTF16 and UTF8 as, atleast for expression trees, there is a large difference if I have Expression.Constant("\"message:\""); which is fast for utf16 or Expression.Constant(Encoding.UTF8.GetBytes("\"message\":")) for UTF8 which was ~20% slower than the UTF16 version. In SpanJson properties up to ~30 chars length are written via integers, for longer properties the byte[] copy (incl. expression tree) overhead is pretty much the same speed or faster This might not be necessary if you can leverage those shiny new static byte arrays with public ReadOnlySpan<byte> property => new byte[]{0xA,0xB,0xC}, but it might be worth a try.

That's more or less the last larger performance trick in SpanJson as now the UTF8 and UTF16 serialization speeds are the same, still deserialization of UTF8 is slower, but that's mostly due to the utf8 to utf16 overhead for strings., especially since @stephentoub did all those nice TryFormat``TryParse`` optimizations for both UTF8 and UTF16.

KrzysztofCwalina commented 5 years ago

An additional thing we would need to support Azure SDK: version resilient serialization, including round-tripping.

We need to be able to add a property bag field (dictionary) to deserialized types. If deserialized payload has a property that does not exist in the type, the property would be saved in the dictionary. The serializer would serialize not only actual properties of the type, but also the dictionary (so all payload round trips).

cc: @SchaabS

pranavkm commented 5 years ago

@KrzysztofCwalina @Tornhoof's comment from here - https://github.com/dotnet/corefx/issues/34372#issuecomment-451928379 - goes in to this. JSON.NET supports it using ExtensionDataAttribute, the data contract serializers support it using IExtensibleDataObject.

pranavkm commented 5 years ago

I spent a little bit of time plugging in the deserializer in to MVC to see what comes out of it. This is based on version 0.1.1-e190205-1 of the package.

var settings = new JsonConverterSettings();
settings.AddAttribute(modelType, new JsonCamelCasingConverterAttribute());

This works for the immediate properties of modelType, but not sub-properties (e.g. Person.PersonAddress.StreetName). While we're here, would it make sense to have the method name indicate this was a policy being applied? AddAttribute really doesn't convey that information.

steveharter commented 5 years ago

@pranavkm

Thanks for the feedback!

The attributes can be specified at an Assembly level for all types in a given Assembly. By using design-time attributes like this it helps to isolate the settings for a particular consumer's types assuming they belong to the same Assembly. I can revisit this and\or make it easier to specify "globally" per the settings class if we want to do that.

settings.AddAttribute(modelType.Assembly, new JsonCamelCasingConverterAttribute());

Is deserialization meant to perform case sensitive property name matches

Not by default (current plan). I relialize this is different than Json.Net. However this helps with performance slightly and is the "right" json-spec thing to do. I haven't added the case-insensitive switch yet.

We spoke about this in the offline sync up, but it would be nice if FromJsonAsync supported PipeReader

Yes that is the current plan. I will be working on it this week.

Json.NET supports Javascript literals which makes both { Id: 10 } and { 'Id': 10 } legal inputs.

I'll look into this. @ahsonkhan has this been considered?

Deserializing some types don't work. For instance, Dictionary<,> didn't work.

Dictionary is not implemented yet. The next iteration (and current PR in corefxlab) support the Nullable types.

There's a mix of InvalidOperationException and JsonReaderException thrown by the converter for the same kinds of exceptions

Yes I plan on having the exceptions cleaned up and have the information regarding line number etc. The internal reader throws FormatException as well and I'll switch to using the TryParse* methods instead to avoid that.

ahsonkhan commented 5 years ago

has this been considered?

Yes, and given single quote strings are not JSON RFC compliant, we decided against supporting it at the lowest layer. Adding too many of these switches/options results in perf costs for the most common cases of using the Utf8JsonReader, where people have valid, spec-complaint JSON payloads. If there is a strong requirement for this, especially if its affecting a lot more customers, we can reconsider adding an option for it at the reader layer.

That said, as an alternative solution, we should evaluate if this is something that occurs frequently at the higher layer (i.e. Deserializer) and consider only supporting it there.

Optionally including the path inJsonReaderException errors would be very useful. When working with large JSON payloads, narrowing down what failed can be very useful in diagnosing the error. Knowing there's an error in (Line 1, BytePosition: 864), is not always useful.

Unfortunately, this has performance implications since it requires potentially storing the path in some re-sizable data structure which ends up allocating. We have made an effort to keep the reader non-allocating with emphasis on performance which made providing Path info difficult.

BrennanConroy commented 5 years ago

Similarly I've tried out the deserializer on the SignalR side, version 0.1.1-e190201-1.

One of the biggest issues is that we use the Utf8JsonReader up until we find an array of objects that we want to parse. Then we grab each object individually and pass them into the converter. The problem is that in order to get a usable chunk of data into the converter from the reader, you need to do a lot of manual work. For example to serialize a string like type (String, DateTime, etc.):

var bytes = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
if (reader.TokenType == JsonTokenType.String)
{
    var b = new byte[bytes.Length + 2];
    b[0] = 34; // "
    bytes.CopyTo(b.AsSpan().Slice(1, bytes.Length));
    b[bytes.Length + 1] = 34; // "
    bytes = b.AsSpan();
}
var obj = Serialization.JsonConverter.FromJson(bytes, type);

(Yes I know this is unoptimal, but ignore that) And to deserialize an array you'd need to do a similar thing and add the brackets around the payload.

Another thing is that the DateTime parsing is far too strict. When testing I had to write my own deserialization for DateTime so I wasn't even using the converter anymore. I know there is https://github.com/dotnet/corefx/issues/34576 for the writer doing "smallest output that round trips", and https://github.com/dotnet/corefx/issues/34690 for the reader getting DateTime directly, so it will probably be fixed around the same time as those.

Because Json properties can be in any order we sometimes need to store a "JArray" like object to then be used later in the deserialzer. For now I store the BytesConsumed from the reader, then run Skip() and store the BytesConsumed afterwards and use both of those later on the original ValueSequence to get the bytes needed to pass into the deserializer. Some kind of first class feature for this would be amazing.

Looking to the future where we'll be using the Utf8JsonWriter alongside the serializer, there needs to be a way of writing to the writer either from the serializer, or get the json out of the serializer and write raw bytes to the writer. I know @ahsonkhan has proposed some APIs in that area for exposing writing raw bytes to the writer. And was recently working on a change for writing JsonElement to the writer.

On a brigher note, when I did some microbenchmarks with the Utf8JsonReader + JsonConverter they turned out ~77% faster than using the low level types of Newtonsoft.Json.

cc @ahsonkhan I probably missed some feedback from when I was talking with you.

pranavkm commented 5 years ago

The attributes can be specified at an Assembly level for all types in a given Assembly.

That wouldn't necessarily work for MVC since complex sub-properties could come multiple different assemblies.

I can revisit this and\or make it easier to specify "globally" per the settings class if we want to do that.

That sounds perfect.

steveharter commented 5 years ago

@pranavkm

That wouldn't necessarily work for MVC since complex sub-properties could come multiple different assemblies.

I see three way to add global settings: 1) Duplicate the properties from JsonPropertyAttribute (and perhaps others) to JsonConverterOptions. 2) Support a null for the first parameter in AddAttribute() to represent "all" assemblies. Somewhat unconventional. 3) Add a property to JsonConverterOptions to return a JsonPropertyAttribute. Somewhat unconventional.

Basically options 2 and 3 prevent duplicating 10+ eventual properties from JsonPropertyAttribute.

However if there is a need to use JsonCamelCasingConverterAttribute or having a different DateTime converter, etc. then option 2 looks more appealing. Even if we do option 2 we could still do option 1 for JsonPropertyAttribute for better discoverability of common options.

Currently the API here is taking option 1 but currently just for JsonPropertyAttribute, not for camel-casing and others (pending API discussion).

rynowak commented 5 years ago

Is there really a case to be made that configuring a setting per-assembly is more common that configuring a setting globally?

Put another way, I think there's a really strong case to be made for configuring a setting per-type, per-property and globally, and a weak case to be made for configuring settings per-assembly. Do you have use-cases in mind for that?

steveharter commented 5 years ago

@rynowak

configuring a setting per-assembly... Do you have use-cases in mind for that?

Here you go: 1) My organization requires a couple hundred POCO Types to be used internally. 2) Various developers add POCO Types over time. 3) Various teams use these POCO Types in different settings so they want to use design-time attributes for all possible settings to avoid mismatches in applying the same run-time settings. 4) The organization requires all POCO types to default to camel-casing by default, and require the use of a custom DateTime converter for interop with other code. Since it is easy to forget to apply the attributes to evey POCO type (or property) the attribute can be applied to the assembly that contains the POCO types.

rynowak commented 5 years ago

That seems like you'd just apply the defaults globally though not per-assembly...

steveharter commented 5 years ago

@BrennanConroy

Because Json properties can be in any order we sometimes need to store a "JArray" like object to then be used later in the deserialzer... Some kind of first class feature for this would be amazing.

Is the feature you are asking for controlling the deserialization order of properties? And secondary I suppose specify the serialization order although that's not important.

steveharter commented 5 years ago

That seems like you'd just apply the defaults globally though not per-assembly...

Perhaps if you don't use types from other assemblies that have their own contract\schema.

However, currently there is not a way to specify design-time attributes globally (across assemblies). You could use the run-time AddAttribute(null, myattribute) however if we decide that will work.

Do you have something else in mind (config file support maybe)?

BrennanConroy commented 5 years ago

@steveharter We need a way of storing a json array for later use in the deserializer.

For example, the following two json payloads are equivalent and the target value is used to determine what types are in the arguments array, however, when using the forward reading only Utf8JsonReader we don't know the target until later, so we need to store the array until we have the target.

{ "target": "name", "arguments": [ 1, "example", 2.3 ] }
{ "arguments": [ 1, "example", 2.3 ], "target": "name" }

With Newtonsoft.Json we can do:

argumentsToken = JArray.Load(reader);
// ... later
for (var i = 0; i < length; i++)
{
    argumentsToken[i].ToObject(type, serializer);
}
rynowak commented 5 years ago

However, currently there is not a way to specify design-time attributes globally (across assemblies). You could use the run-time AddAttribute(null, myattribute) however if we decide that will work.

Right, I'm questioning why we need the ability to specify settings per-assembly. Existing JSON serializers don't have it and I've never see a user ask for it.

Specifying settings for the application as a whole (regardless of what assembly the data types live in) is exceedingly common for a web project.

terrajobst commented 5 years ago

Notes from today are here.

jtillman commented 5 years ago

IMO, the deserialization calls should implement 'TryRead'. Since the json attempted to be deserialized typically comes from external sources, serialization exceptions are expected to be a common occurrence. If we place deserialization at the edge of a protocol, then the try/catch pattern will cause a noticeable amount of performance (time) to serve the request due to exception catching.

Waleed-KH commented 5 years ago

The APIs naming is so long and not clear

System.Text.Json.Serialization.JsonSerializer.Read<TValue>(ReadOnlySpan<byte> utf8Json);
System.Text.Json.Serialization.JsonSerializer.ReadString<TValue>(string json);
System.Text.Json.Serialization.JsonSerializer.Write<TValue>(TValue value);
System.Text.Json.Serialization.JsonSerializer.WriteString<TValue>(TValue value);

So, I would recommend more simple naming that looks like this:

namespace System.Json
{
    public static class JsonConvert
    {
        public static TValue Parse<TValue>(ReadOnlySpan<byte> utf8Json, JsonSerializerOptions options = null);
        public static TValue Parse<TValue>(string json, JsonSerializerOptions options = null);
        public static byte[] ToBytes<TValue>(TValue value, JsonSerializerOptions options = null);
        public static string ToString<TValue>(TValue value, JsonSerializerOptions options = null);
        ...
    }
}

Also would be more symmetrical with System.Convert and Newtonsoft.Json.JsonConvert

steveharter commented 5 years ago

@jtillman

IMO, the deserialization calls should implement 'TryRead`

A simple bool TryRead(...) won't include error information. For the read case, the exception will include a message, line number, byte offset and property path. Is this important for your scenario, or is a bool sufficient? We could also provide an overload that takes an out parameter for the error information.

steveharter commented 5 years ago

@Waleed-KH thanks for your feedback.

The APIs naming is so long and not clear

There was an update based on an API review a few hours before you commented - the "Read\Write" is now only for the async stream\pipe APIs and we use "ToXXX\Parse" terminology for the rest.

It would be more natural if these APIs under System.Json, that's would make it more symmetrical with System.Xml

@ahsonkhan can you comment on the history of using System.Text.Json instead of System.Json -- did we not use System.Json because it already contained older\existing APIs?

jtillman commented 5 years ago

@steveharter The information provided by the exception would be beneficial, but throwing an exception on a hot code path is undesirable. Ideally, I would have a single root call that packages error information as a part of the return value.

public class JsonConverterResult<T>
{
  public T Value { get; }
  public JsonError? Error { get; }
  public bool HasError => Error.HasValue;
}

public static class JsonConvert
{
  public static JsonConverterResult<T> ToObject<T>(...);

  public static T ToObject<T>(...)
  {
    JsonConverterResult<T> result = ToObject(...);
    if (result.HasError)
      throw new JsonException(result.Error);
    return result.Value
  }
}

Since this approach isn't typical in most of the existing serialization apis, I would settle for either of the solutions you suggested (boolean or out parameter).

Waleed-KH commented 5 years ago

@steveharter Thanks for your response.

There was an update based on an API review a few hours before you commented - the "Read\Write" is now only for the async stream\pipe APIs and we use "ToXXX\Parse" terminology for the rest.

Yeah I thought of that, that's why I only mentioned the functions that work withReadOnlySpan<byte>, byte[] & string. But I would like to suggest to name the class JsonConvert and put it in the main namespace because this class will be the most used one in these APIs so it would be better if it is in the main namespace not in Serialization.

I also hope you consider using the namespace System.Json instead of System.Text.Json for the reasons that I mentioned before, and because it will be sad not given this good APIs the namespace that it deserves, for an old obsolete APIs that no one use.

steveharter commented 5 years ago

@BrennanConroy

We need a way of storing a json array for later use in the deserializer.

It seems you require three pending features: 1) Ability to (de)serialize a heterogeneous array: your arguments array has multiple types 2) Ability to (de)serialize into temporary state for "overflow" \ "hidden" properties: your arguments array is likely not a public property 3) Assign temporary state once deserialization is finished for the object: your arguments array needs to be passed to the appropriate logic based upon the value of the target property

For (1) we have had discussions on using the existing JsonElement in some fashion, although currently that is coupled tightly to JsonDocument (cc @bartonjs @ahsonkhan ) so we need to decide whether to extend JsonElement or create a new type to handle this.

For (2) and (3) I am working on the explicit class (de)serialize feature where you can implement before- and after-serialize methods along with per-property callbacks including "overflow" properties that exist in the json but not the class.

Here's a mock-up using JsonElement + "overflow property" + callbacks.

// Either through an attribute or code, register your callback\converter:
settings.AddConverter<MyPocoType>(typeof(MyPocoTypeConverter));
...

// Implement your callbacks (an instance of this struct is created per POCO instance):
public struct MyPocoTypeConverter : ITypeConverterOnMissingPropertyDeserialized, ITypeConverterOnDeserialized
{
    IList<JsonElement> _arguments;

    void ITypeConverterOnMissingPropertyDeserialized.OnMissingPropertyDeserialized(
        object obj,
        JsonClassInfo jsonClassInfo,
        string propertyName,
        object propertyValue,
        JsonTokenType tokenType,
        JsonSerializerOptions options)
    {
        if (jsonPropertyInfo.PropertyName == "arguments")
        {
            Debug.Assert(tokenType == JsonTokenType.Array);
            _arguments = (IList<JsonElement>)propertyValue;
        }
    }

    void ITypeConverterOnDeserialized.OnDeserialized(
        object obj,
        JsonClassInfo jsonClassInfo,
        JsonSerializerOptions options)
    {
        MyPocoType poco = (MyPocoType)obj;
        if (poco.Target == "CallMethod2")
        {
            poco.Method2(
                _arguments[1].GetString(),
                _arguments[0].GetInt32(),
                _arguments[2].GetDecimal()
        }
    }
}
BrennanConroy commented 5 years ago

@steveharter I think there is some misunderstanding of our scenario.

We use the Utf8JsonReader to read all the properties both for perf and for custom logic while walking through the payload. One thing we need is the ability to store the "arguments" property (which is an array) for later use in the deserializer. Currently I have a bunch of hacks to try and store the position of the Utf8JsonReader for the "arguments" property and then try to pass in each element to the deserializer one by one.

I showed the Newtonsoft example which is what we already use for the current Json parsing and works perfectly. I think what you're showing is assuming we just call the deserializer with the whole payload which is definitely not what we're doing.

We need interoperability with the Utf8JsonReader and deserializer (similarly with the Utf8JsonWriter and the serializer).

if (utf8JsonReader.ValueSpan.SequenceEqual("arguments"))
{
    argumentsToken = JArray.Load(utf8JsonReader);
}

// ... later
for (var i = 0; i < argumentsToken.Size(); i++)
{
    argumentsToken[i].ToObject(type, serializer);
}
steveharter commented 5 years ago

@BrennanConroy thanks for the clarification. Yes I was assuming you wanted to use the (de)serializer for the entire scenario, at least for the flow-of-control because the (de)serializer will allow access to the lower-level reader\writer in specific callbacks.

We currently have JsonDocument \ JsonElement but since it doesn't currently accept an existing reader you can't do something like JArray.Load. However, those types do seem to be the natural fit to hold the raw values, which then could be passed to the deserializer.

BrennanConroy commented 5 years ago

Should I open a new issue for that feature?

steveharter commented 5 years ago

@BrennanConroy I'll discuss the feature offline with @bartonjs and @ahsonkhan and then reply back here.

One of the cool things about the reader (and writer) is that it supports a streaming model where the buffer doesn't have to contain all of the data. When the reader runs out of buffer for the current property, it returns false and expects to be called again with a fresh buffer (and to be passed the previous JsonReaderState object back in so it can pick up where it left off).

However this complicates scenarios where a more than one property must be returned like in a "JArray" case because every property in theory may have to ask for a fresh buffer.

The deserializer handles this scenario and returns the whole object\array when it is finished including when it spans multiple buffer refreshes (which occurs when using the async pipe\stream APIs). However, the reader doesn't support this scenario internally (it leaves it up to the caller, like the serializer) and the document\element supports this scenario but requires all of the data up-front in basically one large buffer (for stream cases it reads to the end up-front).

So we have to decide who should support this multi-property functionality - the reader\writer, the document\element, the (de)serializer, and\or the caller.

TylerBrinkley commented 5 years ago

Is it really necessary for implementers of JsonValueConverter<TValue> to implement both Write overloads? The logic in them are going to be identical except for the call to Utf8JsonWriter's WriteXXXValue vs WriteXXX method.

Could we add a method to Utf8JsonWriter to prep it with a property name so that the next WriteXXX call writes the value as a property instead of value?

Or alternatively, could we have a single Write(bool writeProperty, Span<byte> escapedPropertyName, TValue value, ref Utf8JsonWriter writer) method where the implementer would then make the appropriate call to the Utf8JsonWriter based on the writeProperty flag?

steveharter commented 5 years ago

@TylerBrinkley

Is it really necessary for implementers of JsonValueConverter to implement both Write overloads?

I've been working with @ahsonkhan on addressing this by either pre-populating the property name and just calling WriteValue methods, wrapping the writer somehow, or by having the Write methods in the writer detect whether the property name is empty and call WriteValue if so (however unfortunately json allows empty property names so that may not work).

steveharter commented 5 years ago

@BrennanConroy update: we are working on extending JsonElement and JsonDocument to make them more usable in your scenario and for scenarios where the (de)serializer has to support "overflow" properties and cases where a property is of type object or JsonElement.

TylerBrinkley commented 5 years ago

Would it make sense to add a JsonSerializerReadOptions class that either inherits or is composed with JsonReaderOptions and include IgnoreNullPropertyValue in the new class? Same applies for writer options.

It wouldn't surprise me if we would want to add more options related to reading and writing later and having some options in JsonSerializerOptions' Reader/WriterOptions property and others directly in JsonSerializerOptions will be confusing to consumers.

AlexeiScherbakov commented 5 years ago

Also it will be usefull to have something like custom Exression serializer builder

LambdaJsonClassConverterBuilder<TestClass2> builder2 = new LambdaJsonClassConverterBuilder<TestClass2>();
builder2.AddObjectProperty(x => x.A, "a");
builder2.AddIntegerProperty(x => x.B, "b");
builder2.AddObjectProperty(x=>x.C,"c");

It will be usefull when class is separated from json serializer logic

TylerBrinkley commented 5 years ago

A big gap I see in the proposal as specified is constructor handling. For a domain-driven design I think it is common to instantiate required parameters in constructors so that the object is always in a valid state.

Since this current proposal only looks for a constructor without parameters the current state of deserialization is unfortunately limited to simple types like data transfer objects. Json.NET invokes the default constructor or else the constructor specified with the JsonConstructorAttribute applied however there's no equivalent option here.

Is more advanced constructor handling being considered?

fredrikhr commented 5 years ago

~Please include a generic overload to JsonSerializerOptions.GetClassInfo(Type) like this:~

namespace System.Text.Json.Serialization
{
    partial class JsonSerializerOptions
    {
        public JsonClassInfo GetClassInfo<T>() => GetClassInfo(typeof(T));
    }
}

Update: Retracted suggestion based on comment by @khellang (https://github.com/dotnet/corefx/issues/34372#issuecomment-475721641)

khellang commented 5 years ago

It was just covered in the review; https://www.youtube.com/watch?v=_CdV75tEsVk. TLDR; There's no need for it and it's easy enough for you to add it yourself as an extension method.

scottdorman commented 5 years ago

@ahsonkhan can you comment on the history of using System.Text.Json instead of System.Json -- did we not use System.Json because it already contained older\existing APIs?

@steveharter Have there been any updates/consideration to using System.Json as the namespace rather than System.Text.Json?

The types in System.Json are, if I remember correctly, a throwback to Silverlight 3. They constitute a legacy API and don't conflict (in naming) to the types in the new JSON API. It seems like the new JSON APIs should be in a System.Json namespace as well, since that seems (to me at least) to be the more natural container and is also consistent with the use of the System.Xml namespace to contain all types that are XML related.

steveharter commented 5 years ago

Have there been any updates/consideration to using System.Json as the namespace rather than System.Text.Json?

I believe that decision is closed. cc @terrajobst

However other feedback is whether System.Text.Json.Serialization should be a sub-namespace of the reader\writer\document that is in System.Text.Json based on how to best factor higher-level functionality (serializer) vs lower-level (reader\writer).

khellang commented 5 years ago

Has there been considered or is it possible for the serializer to provide an additional "report" of deserialized properties? I.e. which properties were set by the serializer and which properties were missing from the JSON payload. This would be really interesting information for web frameworks that do model binding.

Consider the following POCO:

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime BirthDay { get; set; }
}

Since no property is nullable (assuming nullable reference types are enabled and considered), all properties are required by default, but if you call

ReadOnlySpan<byte> utf8= ...
Person person = JsonSerializer.Parse<Person>(utf8);

Where utf8 contains the following JSON payload

{
  "firstName": "John",
  "lastName": "Doe"
}

You'll end up with person.BirthDay == default(DateTime).

At that point, you have no clue whether the BirthDay property was actually specified as 0001-01-01T00:00:00.0000000, or if it's just the default value. That's why you normally define all non-nullable properties as nullable and use validation (and additional metadata, like attributes) to make sure all required values are present.

The problem with that, is that now you have a DTO full of nullable properties, which all contain valid values (verified by validation). Every time you want pull a value from a property, you have to either guard against null (or use HasValue + Value) or supress warnings by static analysis tools (including the new null-flow analysis by Roslyn). This really sucks.

@rynowak Is this something that MVC could use?