dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

[API Proposal]: Custom [Try]GetValue<T> methods on JsonElement for efficient projection of string values to dotnet types #74028

Closed mwadams closed 1 month ago

mwadams commented 2 years ago

Background and motivation

There have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a JsonElement.

So far, there has been no consensus on a suitable approach.

In general, the criticism of previous proposals has been either that:

  1. they require too much supporting scaffolding (e.g. an approach relying on various flavours of Utf8String), or
  2. they are "too raw" - just exposing the underlying UTF8 bytes is inflexible and prone to errors, especially with respect to encoding.

This API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate string allocation.

The basic shape of the API should be familiar. JsonElement already has a number of such methods - projecting to DateTime, DateTimeOffset and Guid, for example, with the GetXXX() and TryGetXXX() APIs.

These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded ReadOnlySpan<byte> or ReadOnlySpan<char>), managing the memory on behalf of the developer.

Importantly, they do not leak any information about the underlying backing for the JSON string value.

Common use cases might include:

They use a similar delegate-and-state pattern to String.Create() to encourage efficient usage, and to avoid accidental allocation of closure-capturing objects.

API Proposal

namespace System.Text.Json;

public readonly struct JsonElement
{
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();

        return _parent.TryGetValue(_idx, parser, state, decode: true, out value);
    }

    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="decode">Indicates whether the UTF8 JSON string should be decoded.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, bool decode, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();

        return _parent.TryGetValue(_idx, parser, state, decode, out value);
    }

    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();

        return _parent.TryGetValue(_idx, parser, state, out value);
    }

    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }

        return value;
    }

    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }

        return value;
    }
}

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The UTF8-encoded JSON string. This may be encoded or decoded depending on context.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Utf8Parser<TState, TResult>(ReadOnlySpan<byte> span, TState state, [NotNullWhen(true)] out TResult? value);

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The JSON string. This will always be in its decoded form.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Parser<TState, TResult>(ReadOnlySpan<char> span, TState state, [NotNullWhen(true)] out TResult? value);

Rough implementation of the additional internal methods on JsonDocument can be found here.

https://github.com/mwadams/runtime/blob/55372e6b8e93a123513cfd6e6415cfd80c0630a4/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs#L660

API Usage

JsonElement element;

if (element.TryParseValue(ParseLength, state: default(object?), decode: true, out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

if (element.TryParseValue(ParseLengthAsDecodedChars, state: new ParseConfiguration(true), out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

static bool ParseLength(ReadOnlySpan<byte> input, object? state, out int result)
{
    result = input.Length; // Or whatever processing you require
    return true;
}

static bool ParseLengthAsDecodedChars(ReadOnlySpan<char> input, ParseConfiguration state, out int result)
{
    result = state.SomeSwitch ? input.Length : input.Length + 1; // Or whatever processing you require
    return true;
}

internal readonly record struct ParseConfiguration(bool SomeSwitch);

Alternative Designs

Risks

This should have no impact on the existing API surface area.

It should be possible to implement largely using the existing methods in JsonDocument and JsonReaderHelper.

There are comments in JsonDocument suggesting that there is a plan to move to common UTF8 encoding/decoding code; this would increase the surface area of methods in JsonDocument that would need to switch to using those common APIs if/when that change occurs.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation There have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a `JsonElement`. So far, there has been no consensus on a suitable approach. In general, the criticism of previous proposals has been either that: 1. they require too much supporting scaffolding (e.g. an approach relying on various flavours of `Utf8String`), or 1. they are "too raw" - just exposing the underlying UTF8 bytes is inflexible and prone to errors, especially with respect to encoding. This API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate `string` allocation. The basic shape of the API should be familiar. `JsonElement` already has a number of such methods - projecting to `DateTime`, `DateTimeOffset` and `Guid`, for example, with the `GetXXX()` and `TryGetXXX()` APIs. These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded `ReadOnlySpan` or `ReadOnlySpan`), managing the memory on behalf of the developer. Importantly, they do not leak any information about the underlying backing for the JSON string value. Common use cases might include: - projections to 3rd party types that are unlikely ever to be directly supported by in-the-box libraries, such as those from NodaTime types - projections to in-the-box types introduced in the libraries but not yet supported by System.Text.Json. - custom projections (e.g. transforms of the JSON string value itself) They use a similar delegate-and-state pattern to the `String.Create()` to encourage efficient usage, and to avoid accidental allocation of closure-capturing objects. ### API Proposal ```csharp namespace System.Text.Json; public readonly struct JsonElement { /// /// Attempts to represent the current JSON string as the given type. /// /// The type of the parser state. /// The type with which to represent the JSON string. /// A delegate to the method that parses the JSON string. /// The state for the parser. /// Receives the value. /// /// This method does not create a representation of values other than JSON strings. /// /// /// if the string can be represented as the given type, /// otherwise. /// /// /// This value's is not . /// /// /// The parent has been disposed. /// public bool TryGetValue(Utf8Parser parser, TState state, [NotNullWhen(true)] out TResult? value) { CheckValidInstance(); return _parent.TryGetValue(_idx, parser, state, decode: true, out value); } /// /// Attempts to represent the current JSON string as the given type. /// /// The type of the parser state. /// The type with which to represent the JSON string. /// A delegate to the method that parses the JSON string. /// The state for the parser. /// Indicates whether the UTF8 JSON string should be decoded. /// Receives the value. /// /// This method does not create a representation of values other than JSON strings. /// /// /// if the string can be represented as the given type, /// otherwise. /// /// /// This value's is not . /// /// /// The parent has been disposed. /// public bool TryGetValue(Utf8Parser parser, TState state, bool decode, [NotNullWhen(true)] out TResult? value) { CheckValidInstance(); return _parent.TryGetValue(_idx, parser, state, decode, out value); } /// /// Attempts to represent the current JSON string as the given type. /// /// The type of the parser state. /// The type with which to represent the JSON string. /// A delegate to the method that parses the JSON string. /// The state for the parser. /// Receives the value. /// /// This method does not create a representation of values other than JSON strings. /// /// /// if the string can be represented as the given type, /// otherwise. /// /// /// This value's is not . /// /// /// The parent has been disposed. /// public bool TryGetValue(Parser parser, TState state, [NotNullWhen(true)] out TResult? value) { CheckValidInstance(); return _parent.TryGetValue(_idx, parser, state, out value); } /// /// Gets the value of the element as the given type. /// /// The type of the parser state. /// The type of the result. /// The parser for the result. /// The state for the parser. /// /// This method does not create a representation of values other than JSON strings. /// /// The value of the element as the given type. /// /// This value's is not . /// /// /// The value cannot be represented as the given type. /// /// /// The parent has been disposed. /// public TResult GetValue(Utf8Parser parser, TState state) { if (!TryGetValue(parser, state, out TResult? value)) { ThrowHelper.ThrowFormatException(); } return value; } /// /// Gets the value of the element as the given type. /// /// The type of the parser state. /// The type of the result. /// The parser for the result. /// The state for the parser. /// /// This method does not create a representation of values other than JSON strings. /// /// The value of the element as the given type. /// /// This value's is not . /// /// /// The value cannot be represented as the given type. /// /// /// The parent has been disposed. /// public TResult GetValue(Parser parser, TState state) { if (!TryGetValue(parser, state, out TResult? value)) { ThrowHelper.ThrowFormatException(); } return value; } } /// /// A delegate to a method that attempts to represent a JSON string as a given type. /// /// The type of the state for the parser. /// The type of the resulting value. /// The UTF8-encoded JSON string. This may be encoded or decoded depending on context. /// The state for the parser. /// The resulting value. /// /// This method does not create a representation of values other than JSON strings. /// /// /// if the string can be represented as the given type, /// otherwise. /// public delegate bool Utf8Parser(ReadOnlySpan span, TState state, [NotNullWhen(true)] out TResult? value); /// /// A delegate to a method that attempts to represent a JSON string as a given type. /// /// The type of the state for the parser. /// The type of the resulting value. /// The JSON string. This will always be in its decoded form. /// The state for the parser. /// The resulting value. /// /// This method does not create a representation of values other than JSON strings. /// /// /// if the string can be represented as the given type, /// otherwise. /// public delegate bool Parser(ReadOnlySpan span, TState state, [NotNullWhen(true)] out TResult? value); ``` Rough implementation of the additional internal methods on `JsonDocument` can be found here. https://github.com/mwadams/runtime/blob/55372e6b8e93a123513cfd6e6415cfd80c0630a4/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs#L660 ### API Usage ```csharp JsonElement element; if (element.TryParseValue(ParseLength, state: default(object?), decode: true, out int value)) { Console.WriteLine($"Length = {value}"); } else { Console.WriteLine($"Unable to parse the value."); } if (element.TryParseValue(ParseLengthAsDecodedChars, state: new ParseConfiguration(true), out int value)) { Console.WriteLine($"Length = {value}"); } else { Console.WriteLine($"Unable to parse the value."); } static bool ParseLength(ReadOnlySpan input, object? state, out int result) { result = input.Length; // Or whatever processing you require return true; } static bool ParseLengthAsDecodedChars(ReadOnlySpan input, ParseConfiguration state, out int result) { result = state.SomeSwitch ? input.Length : input.Length + 1; // Or whatever processing you require return true; } internal readonly record struct ParseConfiguration(bool SomeSwitch); ``` ### Alternative Designs - We could provide additional overloads that allow you _not_ to provide a `TState` value (e.g. defaulting to a `null` value for an `object?`) - I think this is in danger of trapping people into closure-capturing - It would diverge from the spirit of `string.Create()`, although this is perhaps a minor consideration - We could require `TState` to be a value type, and declare it as an `in` parameter, which would minimise copying in the case that your state was a `readonly struct`. ### Risks This should have no impact on the existing API surface area. It should be possible to implement largely using the existing methods in `JsonDocument` and `JsonReaderHelper`. There are comments in `JsonDocument` suggesting that there is a plan to move to common UTF8 encoding/decoding code; this would increase the surface are of methods in `JsonDocument` that would need to switch to using those common APIs if/when that change occurs.
Author: mwadams
Assignees: -
Labels: `api-suggestion`, `area-System.Text.Json`
Milestone: -
mwadams commented 2 years ago

One usage observation: developers may wish to implement extension methods that hide the underlying parsers from their ultimate clients.

public static bool TryGetCustomValue(this JsonElement element, out CustomValue value) {}

This would give these custom value converters the same signature for end users as the built-in APIs.

mwadams commented 2 years ago

For a quantifiable benefit: if implemented, these APIs would enable me to eliminate allocations in three hot-paths in my JsonSchema validation.

In one of our benchmarks, we validate an array of 10,000 "Person" objects.

The Person has a FirstName, LastName, and a DateOfBirth.

The DateOfBirth is a JSON string in date format which we project to a NodaTime local date type.

The name elements are constrained to a maxLength.

Validation causes 40,000 string allocations at a cost of ~3MB (which are otherwise unused because we write the values directly into the output stream - a very common use case).

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FindWholeArrayLoopSchemaGenValidateDeserialize 20.050 ms 0.0598 ms 0.0530 ms 312.5000 - - 2,792,162 B

With this API this reduces to zero.

mwadams commented 2 years ago

As an example of the flexibility of this API, here's a projection that would allow us to do Rune counting of the decoded JSON string, avoiding a string allocation (another requirement for json-schema validation).

public static class JsonElementExtensions
{
    public static int CountRunes(this JsonElement element)
    {
        if (element.TryGetValue(RuneCounter, default(object?), out int result))
        {
            return result;
        }

        throw new InvalidOperationException();
    }

    private static bool RuneCounter(ReadOnlySpan<char> input, object? state, out int result)
    {
        int runeCount = 0;
        SpanRuneEnumerator enumerator = input.EnumerateRunes();
        while (enumerator.MoveNext())
        {
            runeCount++;
        }

        result = runeCount;
        return true;
    }
}

In our actual implementation we would create a whole 'string validator' type that would also do e.g. pattern matching using the new net7 Regex.IsMatch(ReadOnlySpan<T>). This is a highly simplified example that illustrates the benefits of building over this API.

public static class JsonElementExtensions
{
    public static ValidationContext ValidateString(this JsonElement element, in ValidationContext validationContext)
    {
        if (element.TryGetValue(StringValidator, validationContext, out ValidationContext result))
        {
            return result;
        }

        throw new InvalidOperationException();
    }

    private static Regex SomePattern = new Regex("H(.*)o", RegexOptions.Compiled, TimeSpan.FromSeconds(3));

    private static bool StringValidator(ReadOnlySpan<char> input, in ValidationContext context, out ValidationContext result)
    {
        // Emitted if minLength or maxLength
        int runeCount = 0;
        SpanRuneEnumerator enumerator = input.EnumerateRunes();
        while (enumerator.MoveNext())
        {
            runeCount++;
        }

        result = context.WithResult(runeCount < 10);

        // Emitted if has pattern match validation Net7 Regex
        result = result.WithResult(SomePattern.IsMatch(input));
        return true;
    }
}
mwadams commented 2 years ago

I think I'd like to emphasize the benefit of this API for "in-the-box" types.

By allowing maintainers to implement TryGetXXX() methods without having to update System.Text.Json.JsonElement, and yet maintaining close-to-optimal performance, support for new types (like the Date or Time-onlies) could be provided out-of-band with framework releases, and without polluting the (otherwise ever-growing) JsonElement native API.

If there was a strong reason to then roll those into the JsonElement API itself, that could be done with the next major version, and the Extension method deprecated, and implemented over the new API for backwards compatibility.

mwadams commented 1 year ago

I thought it might be useful to see the practical benefit of this.

We have an internal implementation of this API, along these lines; it offers a good tradeoff with allocation (approx zero in steady state) and performance.

    /// <summary>
    /// Process raw JSON text.
    /// </summary>
    /// <typeparam name="TState">The type of the state for the processor.</typeparam>
    /// <typeparam name="TResult">The type of the result of processing.</typeparam>
    /// <param name="element">The json element to process.</param>
    /// <param name="state">The state passed to the processor.</param>
    /// <param name="callback">The processing callback.</param>
    /// <param name="result">The result of processing.</param>
    /// <returns><c>True</c> if the processing succeeded, otherwise false.</returns>
    public static bool ProcessRawText<TState, TResult>(
        this JsonElement element,
        in TState state,
        in Utf8Parser<TState, TResult> callback,
        [NotNullWhen(true)] out TResult? result)
    {
        if (UseReflection)
        {
            return callback(GetRawValue(element).Span, state, out result);
        }
        else
        {
            PooledWriter? writerPair = null;
            try
            {
                writerPair = WriterPool.Get();
                (Utf8JsonWriter w, ArrayPoolBufferWriter<byte> writer) = writerPair.Get();
                element.WriteTo(w);
                w.Flush();
                return callback(writer.WrittenSpan[1..^1], state, out result);
            }
            finally
            {
                if (writerPair is not null)
                {
                    WriterPool.Return(writerPair);
                }
            }
        }
    }

I have added the "UseReflection" path for benchmarking. If we choose that, it emits some code to get the underlying memory from the JsonDocument, as if this was part of the internals of the TryGetValue() method proposed here.

    private static GetRawValueDelegate BuildGetRawValue()
    {
        Type returnType = typeof(ReadOnlyMemory<byte>);
        Type[] parameterTypes = new[] { typeof(JsonElement) };

        Type jsonElementType = typeof(JsonElement);
        Type jsonDocumentType = typeof(JsonDocument);
        FieldInfo parentField = jsonElementType.GetField("_parent", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._parent field");
        FieldInfo idxField = jsonElementType.GetField("_idx", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._idx field");
        MethodInfo getRawValueMethod = jsonDocumentType.GetMethod("GetRawValue", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonDocument.GetRawValue method");

        var getRawText = new DynamicMethod("GetRawValue", returnType, parameterTypes, typeof(LowAllocJsonUtils).Module, true);

        ILGenerator il = getRawText.GetILGenerator(256);
        LocalBuilder idx = il.DeclareLocal(typeof(int));
        il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument
        il.Emit(OpCodes.Ldfld, idxField);
        il.Emit(OpCodes.Stloc_0, idx); // Store the index in a local variable
        il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument again
        il.Emit(OpCodes.Ldfld, parentField); // Put the parent on the stack
        il.Emit(OpCodes.Ldloc_0, idx); // Put the index on the stack from the local variable
        il.Emit(OpCodes.Ldc_I4_0); // Put "false" onto the stack
        il.Emit(OpCodes.Callvirt, getRawValueMethod);
        il.Emit(OpCodes.Ret);

        return getRawText.CreateDelegate<GetRawValueDelegate>();
    }

As you can see, it simply gets the gets the underlying raw value from the parent JsonDocument.

The benchmarks show a 20-30% benefit.

image image image

(As a reminder - this is an underlying means of accessing the raw text - we optionally decode and/or translate to RoS<char> further up the implementation; these benchmarks include that decoding and transcoding - they are performing json-schema document validation.)

mwadams commented 1 year ago

Those benchmarks were on net7.0. It's well worth comparing with net8.0 (this is the last preview, not RC1) We get huge perf improvements from the new runtime - I'm guessing that's a lot of vectorization improvements, and I also notice that our tiny bit of allocation that we need to do in the "everything is fine" case has got smaller, too!

That's pretty impressive for "no work" - and indeed is as much benefit as this proposed change would produce... except as you can see we get it again even on net8.0!

image image image

mwadams commented 1 year ago

I should say that the Corvus.JsonSchema library is ours. JsonEverything is @gregsdennis excellent cornucopia of json goodness, also built over System.Text.Json.

With Corvus we are trying to make JsonSchema validation a "just do it and you don't notice" option for API implementers, with the added bonus of an object model over the document "for free" rather than having to hand build a separate set of POCOs for serialisation.

A validation pass for a "normal" sized document is a couple of microseconds and zero allocations.

For a 1.5Mb document it is a few 10s of milliseconds and a few 10s of bytes of allocation.

We are a bit obsessive about shaving as much as we can off that!

mwadams commented 11 months ago

Blog post about net8 performance wins and potential impact of this change.

https://endjin.com/blog/2023/12/how-dotnet-8-boosted-json-schema-performance-by-20-percent-for-free

mwadams commented 1 month ago

As we now have JsonMarshal I believe that this issue can be closed.

I'm just preparing the next preview build of Corvus.JsonSchema which has support for .NET 9 (and uses the new JsonMarshal type to optimize access to the underlying UTF8 JSON bytes).

Thought you might be interested in the benchmarks. The "Fast" benchmarks exercise a codepath using JsonMarshal v. the old path with the workaround discussed in this issue.

Running on .NET 8 with the .NET9 System.Text.Json library and JsonMarshal is 13% faster. Running on .NET 9 RC2 nightly (from today) it is a whopping 32% faster.

[As usual, those very weird 12B and 17B allocations are a BenchmarkDotNet quirk - it doesn't actually alllocate at all]

This is on Windows 11 with a 13th Gen Intel(R) Core(TM) i7-13800H @ 2.90 GHz

| Method                         | Toolchain            | Mean      | Error     | StdDev    | Ratio | Allocated | Alloc Ratio |
|------------------------------- |--------------------- |----------:|----------:|----------:|------:|----------:|------------:|
| ValidateLargeArrayCorvusV4     | .NET 8.0             | 13.035 ms | 0.0272 ms | 0.0241 ms |  1.00 |      12 B |        1.00 |
| ValidateLargeArrayCorvusV4Fast | .NET 8.0             | 11.311 ms | 0.0536 ms | 0.0448 ms |  0.87 |      12 B |        1.00 |
| ValidateLargeArrayCorvusV4     | .NET 9.0 RC2 Nightly | 10.436 ms | 0.0226 ms | 0.0188 ms |  0.80 |      17 B |        1.42 |
| ValidateLargeArrayCorvusV4Fast | .NET 9.0 RC2 Nightly |  8.813 ms | 0.0526 ms | 0.0492 ms |  0.68 |      12 B |        1.00 |

For those interested, that's validating an array of 10,000 instances of this entity:

 {
     "name": {
       "familyName": "Oldroyd",
       "givenName": "Michael",
       "otherNames": [],
       "email": "michael.oldryoyd@contoso.com"
     },
     "dateOfBirth": "1944-07-14",
     "netWorth": 1234567890.1234567891,
     "height": 1.8
 }

Against this schema (including full format validation for e.g. dates, email, numerics)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "JSON Schema for a Person entity coming back from a 3rd party API (e.g. a storage format in a database)",
  "type": "object",
  "$ref": "#/$defs/Person",
  "$defs": {
    "PersonArray": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/Person"
      }
    },
    "Person": {
      "type": "object",

      "required": [ "name" ],
      "properties": {
        "name": { "$ref": "#/$defs/PersonName" },
        "dateOfBirth": {
            "type": [ "string", "null" ],
          "format": "date"
        },
        "email": {
          "type": "string",
          "format": "email"
        },
        "netWorth": {
          "type": "number",
          "format": "decimal"
        },
        "height": {
          "$ref": "#/%24defs/HeightRangeDouble"
        }
      }
    },
    "HeightRangeDouble": {
      "type": "number",
      "minimum": 0,
      "maximum": 3.0
    },
    "PersonName": {
      "type": "object",
      "description": "A name of a person.",
      "required": [ "familyName" ],
      "properties": {
        "givenName": {
          "$ref": "#/$defs/PersonNameElement",
          "description": "The person's given name."
        },
        "familyName": {
          "$ref": "#/$defs/PersonNameElement",
          "description": "The person's family name."
        },
        "otherNames": {
          "$ref": "#/$defs/OtherNames",
          "description": "Other (middle) names for the person"
        }
      }
    },
    "OtherNames": {
      "oneOf": [
        { "$ref": "#/$defs/PersonNameElement" },
        { "$ref": "#/$defs/PersonNameElementArray" }
      ]
    },
    "PersonNameElementArray": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/PersonNameElement"
      }
    },
    "PersonNameElement": {
      "type": "string",
      "minLength": 1,
      "maxLength": 256
    }
  }
}
mwadams commented 1 month ago

Having said that, JsonProperty remains a conundrum.

Because its Name is either accessed via the preceding DbRow of the Value, or its own string backing we have an anomaly that it may or may not require transcoding, which is, I presume, why we don't have a comparable raw accessor for the name.

Internally, most uses of the property name assume that it will be safe to assume a JsonElement-backed name (e.g. in DeepEquals()) - so it would still be useful to have a simple JsonMarshal method like:


        /// <summary>
        /// Gets a <see cref="ReadOnlySpan{T}"/> view over the raw JSON data of the given <see cref="JsonProperty"/>.
        /// </summary>
        /// <param name="property">The JSON property from which to extract the span.</param>
        /// <returns>The span containing the raw JSON data of <paramref name="property"/>.</returns>
        /// <exception cref="ObjectDisposedException">The underlying <see cref="JsonDocument"/> has been disposed.</exception>
        /// <remarks>
        /// <para>
        /// While the method itself does check for disposal of the underlying <see cref="JsonDocument"/>,
        /// it is possible that it could be disposed after the method returns, which would result in
        /// the span pointing to a buffer that has been returned to the shared pool. Callers should take
        /// extra care to make sure that such a scenario isn't possible to avoid potential data corruption.
        /// </para>
        /// </remarks>
        public static ReadOnlySpan<byte> GetRawUtf8PropertyName(JsonProperty property)
        {
            return property.NameSpan;
        }
mwadams commented 1 month ago

I have opened #107782 to contain this suggestion, and will close this issue.