dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.93k stars 4.64k forks source link

[API Proposal]: Utf8JsonReader should read from JsonNode #106047

Open ygoe opened 1 month ago

ygoe commented 1 month ago

Background and motivation

I have deserialisation code that reads data from JSON text, usually files on disk. It's using the Utf8JsonReader to read the source. It already exists and is somewhat complex so I don't want to duplicate the whole thing to also read from JsonNode. Another API that receives JSON-formatted messages from a network connection gives me the payload part as JsonNode because it has already read the whole message and needed to process parts of it.

So what I'm doing now is this:

string dataJson = message.PayloadData.ToJsonString();
byte[] dataJsonBytes = Encoding.UTF8.GetBytes(dataJson);
var reader = new Utf8JsonReader(dataJsonBytes);

This converts me the already parsed payload from JsonNode to serialised text, copies it over a few times for the encoding and finally I have something that Utf8JsonReader can read from.

Since Utf8JsonReader is a ref struct, I can't simply derive from it and change the behaviour. It needs to be the exact thing that can read from JsonNode as well internally.

API Proposal

using System.Text.Json.Nodes;

namespace System.Text.Json;

public ref struct Utf8JsonReader
{
    public Utf8JsonReader(JsonNode source);
}

API Usage

Save the text work and read tokens from the nodes directly. It should be easy since the tokens are already known. The nodes can simply be iterated and returned in the reader API.

var reader = new Utf8JsonReader(message.PayloadData);

Alternative Designs

Extra processing and memory allocation through the intermediate serialisation step. This is the current workaround.

Risks

I don't see any. It's a new API that has never existed before and nobody needs to use it if they don't need to.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

am11 commented 1 month ago

So what I'm doing now is this:

It can be improved with something like:

using System.Buffers;
using System.Text.Json;
using System.Text.Json.Nodes;

JsonNode node = GetNode();
Utf8JsonReader reader = node.AsUtf8JsonReader();

Use(reader);

public static class JsonNodeExtensions
{
    public static Utf8JsonReader AsUtf8JsonReader(this JsonNode payloadData)
    {
        ArrayBufferWriter<byte> bufferWriter = new();

        using Utf8JsonWriter writer = new(bufferWriter);
        payloadData.WriteTo(writer);

        return new(bufferWriter.WrittenMemory.Span);
    }
}
eiriktsarpalis commented 1 month ago

So what I'm doing now is this:

string dataJson = message.PayloadData.ToJsonString();
byte[] dataJsonBytes = Encoding.UTF8.GetBytes(dataJson);
var reader = new Utf8JsonReader(dataJsonBytes);

This converts me the already parsed payload from JsonNode to serialised text, copies it over a few times for the encoding and finally I have something that Utf8JsonReader can read from.

This is a known restriction emanating from Utf8JsonReader being a ref struct. In fact, STJ itself is forced to perform this inefficient conversion in many places in its own implementation.

I don't believe extending the Utf8JsonReader type itself to support all possible sources of JSON (UTF-16, JsonNode, JsonDocument, etc.) is a sustainable solution: it isn't pay-for-play and even the name of the type itself suggests that it's restricted to UTF-8 JSON decoding responsibilities. Longer-term, we need a reader abstraction, e.g. an IJsonReader interface akin to the abstract JsonReader class in Json.NET.

Note that adding such an interface is dependent on the language adding support for ref struct interface implementations, but even then this would require duplicating most of the serialization API surface to support polymorphic readers. For starters, JsonConverter<T> would need to add a new method:

public partial class JsonConverter<T>
{
    public virtual T? Read<TReader>(ref TReader reader, Type type, JsonSerializerOptions options) where TReader : allow ref struct, IJsonReader => throw new NotImplementedException();
}

which converter authors would then need to implement. Needless to say, this has cost and complexity comparable to implementing a serializer from scratch.

cc @stephentoub @davidfowl

stephentoub commented 1 month ago

Note that adding such an interface is dependent on the language https://github.com/dotnet/csharplang/issues/7608

For completeness, this exists now in C# 13 and .NET 9.

ygoe commented 1 month ago

It can be improved with something like:

@am11 FYI, a writer.Flush() call is still needed after the WriteTo call, or the written bytes will be incomplete. I know this type of problem already. ;-) But thanks for the suggestion, it looks better than mine and I can live with it for now.