dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

JsonSerializer reading from Utf8JsonReader #29208

Closed BrennanConroy closed 4 years ago

BrennanConroy commented 5 years ago

Currently if you want to get an object out of a Utf8JsonReader you need to first use the reader with a temporary JsonDocument, then pass the JsonElement.GetRawText() from that document into the JsonSerializer.

There should be a way to do this directly from the serializer and avoid the temporary JsonDocument and GetRawText() call.

var reader = new Utf8JsonReader(someBytes, isFinalBlock: true, state: default);
using var token = JsonDocument.ParseValue(ref reader);
var obj = JsonSerializer.Parse(token.RootElement.GetRawText(), someType);

The requested API would look something like:

var reader = new Utf8JsonReader(someBytes, isFinalBlock: true, state: default);
var obj = JsonSerializer.Parse(ref reader, someType);

I did an perf test with the current approach vs. something similar to the proposed change

TLDR

Using something similar to JsonSerializer.Parse(ref reader, someType); could improve Op/s from 22,587 to 59,662 and allocations from 50.28KB to 20.23KB in the scenario I was testing.

Current Approach:

Method Input HubProtocol Mean Error StdDev Op/s Gen 0 Allocated
ReadSingleMessage LargeArguments Json 44.27 us 1.3590 us 4.007 us 22,587.6 1.4648 50.28 KB
ReadSingleMessage LargeArguments NewtonsoftJson 51.58 us 1.0247 us 2.552 us 19,386.1 0.5493 20.68 KB

Using the original someBytes and reader.BytesConsumed to strategically slice data off and pass it directly to JsonSerializer.Parse(...):

Method Input HubProtocol Mean Error StdDev Op/s Gen 0 Allocated
ReadSingleMessage LargeArguments Json 16.76 us 0.4949 us 1.459 us 59,662.9 0.5493 20.23 KB
ReadSingleMessage LargeArguments NewtonsoftJson 51.86 us 1.0326 us 1.725 us 19,282.0 0.5493 20.68 KB
ahsonkhan commented 5 years ago

NewtonsoftJson 51.58 -> 39.31. Why did that one change?

BrennanConroy commented 5 years ago

I'm not sure, that one confused me too. I ran it twice and got the same result.

Maybe benchmark dotnet isn't as sandboxed as it could be


From: Ahson Khan notifications@github.com Sent: Monday, April 8, 2019 6:01:24 PM To: dotnet/corefx Cc: Brennan Conroy; Author Subject: Re: [dotnet/corefx] JsonSerializer reading from Utf8JsonReader (#36717)

NewtonsoftJson 51.58 -> 39.31. Why did that one change?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F36717%23issuecomment-481061469&data=02%7C01%7Cbrecon%40microsoft.com%7Ca680792d94134157a92508d6bc86e34e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636903684858807240&sdata=WgvpAyb7HD%2BYKIbF29%2BZKEa%2B63upsIGS6m%2BSJxechSo%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHOVEXFKSOC5081vHYd1QR_4THSqszwWks5ve-ZkgaJpZM4cjWJe&data=02%7C01%7Cbrecon%40microsoft.com%7Ca680792d94134157a92508d6bc86e34e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636903684858817248&sdata=0EHeQf9%2BRqjPAanhFG7SgMd5vcCo0bVzPnTk9s7VDjM%3D&reserved=0.

BrennanConroy commented 5 years ago

Tried it again and got more reasonable numbers. Updated inline

stephan-tolksdorf commented 5 years ago

Such a Parse overload would allow to e.g. deserialize a params object in a JSON RPC request without having to read (and validate) the JSON representation of the params object twice.

rynowak commented 5 years ago

We have a use case for this when writing custom converters. Frequently you want to use the JsonSerializer to read a value from the reader so that the serializer can do the right thing.

public static partial class JsonSerializer
{
    public static object ReadValue(ref Utf8JsonReader reader, Type returnType, JsonSerializerOptions options = null) { throw null; }
    public static TValue ReadValue<TValue>(ref Utf8JsonReader reader, JsonSerializerOptions options = null) { throw null; }
}
steveharter commented 5 years ago

The ReadValue terminology -- does just Read work? We have ReadAsync already and that would be consistent.

Design notes:

ahsonkhan commented 5 years ago

The ReadValue terminology -- does just Read work? We have ReadAsync already and that would be consistent.

The ReadAsync tries to read as much as possible. ReadValue would be consistent with JsonDocument.ParseValue which reads just one value (whether it be an object/array or primitive).

terrajobst commented 5 years ago

Video

Approved because we approved the combined one dotnet/runtime#29205.

BrennanConroy commented 5 years ago

Did a before and after for this change: Before:

Method Input HubProtocol Mean Error StdDev Op/s Gen 0 Gen 1 Allocated
ReadSingleMessage LargeArguments Json 21.11 us 1.6846 us 4.967 us 47,381.8 2.0752 0.0610 50.29 KB
ReadSingleMessage LargeArguments NewtonsoftJson 19.43 us 0.4236 us 1.236 us 51,469.2 0.7019 - 20.68 KB

After:

Method Input HubProtocol Mean Error StdDev Op/s Gen 0 Allocated
ReadSingleMessage LargeArguments Json 9.199 us 0.5671 us 1.672 us 108,709.4 0.6714 20.15 KB
ReadSingleMessage LargeArguments NewtonsoftJson 19.627 us 0.3924 us 1.126 us 50,950.9 0.7019 20.68 KB

With custom slicing code (near optimal perf):

Method Input HubProtocol Mean Error StdDev Op/s Gen 0 Allocated
ReadSingleMessage LargeArguments Json 8.244 us 0.5701 us 1.672 us 121,294.4 0.6866 20.23 KB
ReadSingleMessage LargeArguments NewtonsoftJson 20.251 us 0.3925 us 1.088 us 49,381.2 0.7019 20.68 KB