dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.71k forks source link

Writing raw JSON values when using Utf8JsonWriter #1784

Closed nahk-ivanov closed 3 years ago

nahk-ivanov commented 4 years ago

Edited by @layomia

Original proposal by @nahk-ivanov (click to view) According to the image in the [roadmap](https://github.com/dotnet/corefx/tree/master/src/System.Text.Json/roadmap) - ```Utf8JsonWriter``` would be the lowest level of API we are going to get, but it's missing ```WriteRawValue``` functionality from JSON.NET. Seems like it is "by design", as roadmap states > Does not allow writing invalid JSON according to the JSON RFC In this case, it would be nice to have access to the output stream directly in the custom ```JsonConverter```, as there are still scenarios where for performance we want to write raw JSON. It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL). What we used to do with JSON.NET in such cases is maintain it as string property and define custom converter that would use ```WriteRawValue```, which took a ```string``` argument. One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form. I don't see any reason why application would unnecessarily deserialize and serialize these parts of the tree - all we need is to just run forward-only reader to return the value of the given property in it's raw form and later write it as such. I looked at the current ```System.Text.Json``` APIs and can't find a good way of doing this. It seems like (for performance reasons) we would want to maintain raw data in a binary form now (compared to string previously), which is fine, but there is no way to read/write next value as bytes. The best I could find is to use ```JsonDocument.ParseValue``` + ```document.RootElement.GetRawText()``` when reading it and parse it again into a ```JsonDocument``` when writing to just call ```WriteTo```. Am I missing some APIs?

Introduction

There are scenarios where customers want to integrate existing, "raw" JSON payloads when writing new JSON payloads with Utf8JsonWriter. There's no first-class API for providing this functionality today, but there's a workaround using JsonDocument:

// UTF8 representation of {"Hello":"World"}
byte[] rawJson = GetRawPayload();

writer.WriteStartObject();
writer.WritePropertyName("Payload");

// No easy way to write this raw JSON payload. Here's a workaround:
using (JsonDocument document = JsonDocument.Parse(rawJson))
{
    document.RootElement.WriteTo(writer);
}

writer.WriteEndObject():

This implementation builds a metadata database for navigating a JSON document, which is unnecessary given the desired functionality and has a performance overhead.

The goal of this feature is to provide performant and safe APIs to write raw JSON values.

API Proposal

namespace System.Text.Json
{
  public sealed partial class Utf8JsonWriter
  {
    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(ReadOnlySpan<char> json, bool skipValidation = false) { }

    // Writes the span of bytes directly as JSON content.
    public void WriteRawValue(string json, bool skipValidation = false) { }
  }
}
Alternate Design (click to view) ```cs namespace System.Text.Json { // Specifies processing options when writing raw JSON values. [Flags] public enum JsonWriteRawOptions { // Raw JSON values will be validated for structural correctness and encoded if required. Default = 0, // Whether to skip validation. Independent of JsonWriterOptions.SkipValidation. SkipValidation = 1, // Whether to skip encoding specified via JsonWriterOptions.Encoder. SkipEncoding = 2, } public sealed partial class Utf8JsonWriter { // Writes the span of bytes directly as JSON content, according to the specified options. public void WriteRawValue(ReadOnlySpan utf8Json, JsonWriteRawOptions options = default) { } } } ``` Potential additions ```cs namespace System.Text.Json { [Flags] public enum JsonWriteRawOptions { // Existing // Default = 0, // SkipValidation = 1, // SkipEncoding = 2, // Whether to reformat to raw payload according to the JsonWriterOptions, or write it as-is. // Reformatting is influenced by JsonWriterOptions.Indented, and may result in whitespace changes. Reformat = 4 } public sealed partial class Utf8JsonWriter { // Existing // public void WriteRawValue(ReadOnlySpan utf8Json, JsonWriteRawOptions? options = null) { } // Write raw value overload that takes ROS. public void WriteRawValue(ReadOnlySpan json, JsonWriteRawOptions options = default) { } // Write raw value overload that takes string. public void WriteRawValue(string json, JsonWriteRawOptions options = default) { } } } ```

Scenarios

The major considerations for this feature are:

The answers depend on the scenario. Is the raw JSON payload trusted or arbitrary? Is the enveloping of the raw payload done on behalf of users? This proposal seeks to make all of these post-processing options fully configurable by users, allowing them to tweak the behavior in a way that satisfies their performance, correctness, and security requirements.

In this proposal, raw JSON values will not be reformatted to align with whatever whitespace and indentation settings are specified on JsonWriterOptions, but left as-is. We can provide a future API to reformat raw JSON values. We'll also not encode values by default but can provide API to do it in the future. If users need all three processing options today, the workaround shown above works for that scenario.

Let's go over two of the expected common scenarios, and what API would be configured to enable them.

I have a blob which I think represents JSON content and which I want to envelope, and I need to make sure the envelope & its inner contents remain well-formed

Consider the Azure SDK team which provides a service protocol where the payload content is JSON. Part of the JSON is internal protocol information, and a nested part of the JSON is user provided data. When serializing protocol information, Azure cares about the raw user JSON being structurally valid, and ultimately that the overall JSON payload is valid. They don't wish to alter the formatting of the raw JSON, or preemptively encode it. Given these considerations, they might write their serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WriteString("id", protocol.Id);
writer.WriteString("eventType", protocol.EventType);
writer.WriteString("eventTime", DateTimeOffset.UtcNow);
// Write user-provided data.
writer.WritePropertyName("data");
writer.WriteRawValue(protocol.UserBlob);
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "id": "a20299ee-f239-42ce-8eee-2da562848fbe",
    "eventType": "Microsoft.Resources.ResourceWriteSuccess",
    "eventTime": "2021-06-10T14:29:10.3363984+00:00",
    "data":{"auth":"{az_resource_mgr_auth}","correlationId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc","tenantId":"76d8b695-e98b-4fb6-81c4-1edc1aa22dfc"}
}

Notice that no JsonWriteRawOptions had to be specified.

I have a deliberate sequence of bytes I want to write out on the wire, and I know what I'm doing

Consider a user who needs to format double values differently from the Utf8JsonWriter.WriteNumber[X] methods. Those methods write the shortest possible JSON output required for round-tripping on deserialization. This means that a float or double value that can be expressed simply as 89 would not be written as 89.0. However, our user needs to send numeric JSON content to a service which treats values without decimal points as integers, and values with decimal points as doubles. This distinction is important to our user. The WriteRawValue APIs would allow our custom to format their numeric values as they wish, and write them directly to the destination buffer. In this scenario, their numeric values are trusted, and our user would not want the writer to perform any encoding or structural validation on their raw JSON payloads, to give the fastest possible perf. To satisfy this trusted scenario, our user might write serialization logic as follows:

JsonWriterOptions writerOptions = new() { WriteIndented = true, };

using MemoryStream ms = new();
using UtfJsonWriter writer = new(ms, writerOptions);

writer.WriteStartObject();
writer.WritePropertyName("dataType", "CalculationResults");

writer.WriteStartArray("data");

foreach (CalculationResult result in results)
{
    writer.WriteStartObject();
    writer.WriteString("measurement", result.Measurement);

    writer.WritePropertyName("value");
    // Write raw JSON numeric value
    byte[] formattedValue = FormatNumberValue(resultValue);
    writer.WriteRawValue(formattedValue, skipValidation: true);

    writer.WriteEndObject();
}

writer.WriteEndArray();
writer.WriteEndObject();

The resulting JSON payload might look like this:

{
    "dataType": "CalculationResults",
    "data": [
        {
            "measurement": "Min",
            "value": 50.4
        },
        {
            "measurement": "Max",
            "value": 2
        }
    ]
}

Notes

What's next?

ahsonkhan commented 4 years ago

Seems like it is "by design"

Yep. In the initial design discussions of Utf8JsonWriter, we had considered adding WriteRaw APIs, but decided against it: https://github.com/dotnet/corefx/issues/33552

there are still scenarios where for performance we want to write raw JSON

We could consider revisiting that decision, if there is evidence of a significant performance benefit in doing so. Do you have some benchmarks for your scenario that would potentially show a win or is there a general concern for too much unnecessary work being done?

In this case, it would be nice to have access to the output stream directly in the custom JsonConverter

One problem with trying to expose the destination is that Utf8JsonWriter could have been constructed to write to different output destinations: Stream (like FileStream) or IBufferWriter<byte> (like ArrayBufferWriter<byte>).

Is your use case specific to uses within JsonConverter? If you are passing the output stream to the JsonSerializer, do you still have access to it?

One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form.

Generally speaking, how large are these nested JSON payloads that are stored in their original form, on average (looking for order of magnitude such as 1 KB, 1 MB, 1 GB)?

It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL).

What about validation though? Do they care about making sure the JSON is valid though before just writing the string directly calling WriteRawValue or is it assumed the JSON to already be validated elsewhere (like in your case where the JSON retrieved was previously written to storage by the JsonSerializer)?

The best I could find is to use JsonDocument.ParseValue + document.RootElement.GetRawText() when reading it and parse it again into a JsonDocument when writing to just call WriteTo.

Am I missing some APIs?

No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter. However, one thing you could do to avoid calling GetRawText and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement from the JsonDocument and call WriteTo on it. Is that feasible for your scenario? It still won't beat the performance of writing the raw bytes directly, but it should improve significantly.

Not knowing the specifics of your scenario, I explored the scenario a bit with some arbitrary JSON data and these are the results:


BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
  Job-DHBJYG : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  
Method Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
RoundTripWithString 95.44 us 1.467 us 1.372 us 95.01 us 93.23 us 98.31 us 1.00 0.00 6.2256 0.8545 - 25.59 KB
RoundTripWithJsonElement 67.49 us 0.543 us 0.454 us 67.47 us 66.65 us 68.14 us 0.71 0.01 4.2725 0.3662 - 17.95 KB
RoundTripRaw 44.97 us 0.688 us 0.610 us 44.83 us 44.31 us 46.29 us 0.47 0.01 7.3853 0.0610 - 30.37 KB
RoundTripWithBytes 92.23 us 1.715 us 1.432 us 91.68 us 90.31 us 94.93 us 0.97 0.02 3.1738 0.2441 - 13.02 KB
RoundTripWithBytesRaw 44.31 us 0.871 us 0.932 us 44.08 us 43.32 us 46.20 us 0.47 0.01 3.1128 0.2441 - 12.95 KB
nahk-ivanov commented 4 years ago

Thanks for taking a stub at this and doing some benchmarking!

At this point it is a general concern for too much unnecessary work being done - this is part of an ASP.NET Core web app hosted in Azure App Service and I'd like to host on as little nodes as possible using smallest SKU possible, so that we scale horizontally, rather than vertically when needed. This looked like a low-hanging fruit to optimize, if all necessary APIs would exist, of course.

Payloads are not big in size (limited by Azure Table Storage entity size anyway), ranging from 1 to 32 KB. But it is an OLTP workload (think online gaming), so we get thousands requests per second and this is not a heavy load yet, of course.

Since serialization/deserialization happens within ASP.NET infrastructure, the easiest would be to hook through JsonConverter. I'll need to check ASP.NET Core APIs to see if I can get a hold of the serializer or request body stream directly to override standard deserialization behavior. I think it might be possible, if I just claim that my action expects a raw binary input, rather than a JSON payload and use JSON reader directly myself. It would be more convenient to have it extract some of the properties automatically though, and only ask me what to with the subset of properties - that's why I was leaning towards JsonConverter approach.

Validation-wise - it is there, but we only need to validate properties at the root level, which we do. Think the following json:

{
  "username": "foo",
  "accessToken": "bar",
  "payload": { "complex": "object" }
}

As long as username and accessToken are valid we don't care if user wants to "shoot himself in a foot" with bogus payload. There are, of course, scenarios where even the payload needs to be validated (e.g. shared with other users), but in this particular case it doesn't matter, as long as overall request size is valid. So what I used to do is map top-level properties and use custom converter on the payload, so that it is preserved in raw form within the application code.

I thought about using JsonElement for the payload property, yeah. This was possible even with JSON.NET as they had similar type, but I don't like serialization-specific types to "leak" into the data model. I'm not happy with using custom attributes, but I can live with it (even though I may not be sleeping well). If I have to use serialization framework-specific types for the properties in the model classes - then it will be even harder to switch from one framework to another. With attributes I can add custom wrapper types and define converters for them externally to the model, but with public JsonElement Payload { get; set; } it's hard to do anything about it.

Based on your benchmarks I'll probably switch to JsonElement (even though it affects my sleep), but it would be nice if we could easily get performance of RoundTripWithBytesRaw using simple public APIs.

layomia commented 4 years ago

From @declspec in https://github.com/dotnet/runtime/issues/32849:

I would like to propose an addition to the System.Text.Json.Utf8JsonWriter API which would allow writing existing UTF8-encoded JSON fragments into the current JSON document.

Something like the following:

public  sealed class Utf8JsonWriter {
    public void WriteUtf8JsonValue(ReadOnlySpan<byte> utf8json);

    public void WriteUtf8Json(string propertyName, ReadOnlySpan<byte> utf8json);
    // ... and the rest of the standard overloads for the different types of `propertyName`
}

An example of usage would be:

var utf8json = Encoding.UTF8.GetBytes("{ \"foo\": 42, \"bar\": \"baz\" }").AsSpan();

using (var ms = new MemoryStream()) {
    using (var writer = new Utf8JsonWriter(ms)) {
        writer.WriteStartObject();
        writer.WriteUtf8Json("fragment", utf8json);
        writer.WriteEndObject();
    }

    Debug.WriteLine(Encoding.UTF8.GetString(ms.ToArray()));

    // Expected output:
    // { "fragment": { "foo": 42, "bar": "baz" } }
}

I believe this would be useful when you are trying to augment or wrap existing JSON payloads (i.e. from external services) in a parent document without needing to deserialize and reserialize them.

I have read through all the documentation I can find and as far as I can see there doesn't currently seem to be any way of doing what I'm proposing. If there is already a better way of doing this please let me know.

mikemcdougall commented 4 years ago

What is the workaround?

george-chakhidze commented 4 years ago

What is the workaround?

@mikemcdougall Going back to good ol' Newtonsoft.Json and its JsonWriter.WriteRawValue.

layomia commented 4 years ago

Work-around as described above:

No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter. However, one thing you could do to avoid calling GetRawText and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement from the JsonDocument and call WriteTo on it.

string rawJson = ...; // Raw value can also be `byte`s

using (JsonDocument document = JsonDocument.Parse(rawJson))
{
    document.RootElement.WriteTo(writer);
}
NinoFloris commented 4 years ago

We're in a similar situation, we heavily use PG json(b) columns for user defined metadata, our backend reads intentially know nothing about their contents, we'd ideally not pay the price of parsing and serializing something we only pass through.

Api proposal seems solid, good to see it's not based on String.

For large payloads -- LOH allocs are easier to avoid by deserializing, which we're trying to prevent here -- a ReadOnlySequence overload could be of use too.

We're looking at introducing ReadOnlySequence in Npgsql (PG driver) too, meaning we could remove all LOH allocs in an end to end scenario involving potentially large column values.

atlefren commented 4 years ago

A bit late to the party here, but this issue is what stops me from converting to system.text.json from newtonsoft.json.

My issue is related to the GeoJSON format, which is a subset of JSON, used to describe geospatial features.

Support for this format and mapping it to C# classes is provided by the NetTopologySuite package. The reading and writing of GeoJson to corresponding C# classes is complex, and not something I want to re-implement. Neither is it something I wish to be included in the standard library.

The problem is that a GeoJSON geometry is a complex structure.

In order to handle this with Newtonsoft I've done the following: https://gist.github.com/atlefren/4ec30d46af1dde48c4a461abc5b0b3c5

However, I cannot find a way to handle this in system.text.json. As far as I'm concerned, validation and such is no issue here, as the NTS reader/writer is responsible for that part.

EDIT: I See that NTS has made this possible on their side, as documented here: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/wiki/Using-NTS.IO.GeoJSON4STJ-with-ASP.NET-Core-MVC#basic-usage

layomia commented 4 years ago

@atlefren did you take a look at the workaround in https://github.com/dotnet/runtime/issues/1784#issuecomment-608331125? Does it help?

olivier-spinelli commented 3 years ago

Sorry for the noob question but @StephenBonikowsky moving this issue to .NET 6 Committed does it mean that it is planned for .Net 6? (The milestone on the card is "Future", so I doubt it.)

(If it is, it's cool! I'd very like to see WriteRaw available method(s).)

StephenBonikowsky commented 3 years ago

@olivier-spinelli thank you for pointing out that discrepancy. @ericstj is this still planned for .NET 6?

ericstj commented 3 years ago

This is part of the https://github.com/dotnet/runtime/issues/43620 epic -> https://github.com/dotnet/runtime/issues/45190 user story. We expect to get to this in 6.0. I've adjusted the milestone.

pbiggar commented 3 years ago

I'm having a similar problem using System.Text.Json. I want to write the double 81.0. The docs say:

Writes the Double using the default StandardFormat (that is, 'G') on .NET Core 3.0 or later versions.

As a result, 81.0 will only print as 81, and I do not appear to have any ability to make the text 81.0 appear in my JSON. This means that I can't tell the difference between an int or a float in my JSON.

I would like the ability to control the output of a double, and one way to solve that would be to allow me to write a raw value into the json.

pbiggar commented 3 years ago

I've hit another use case of needing a raw value. I need to interoperate exactly with the output of OCaml's yojson. Yojson incorrectly outputs Infinity, -Infinity, and NaN. Though they violate the spec, I still need to read them and write them.

I had a full implementation of my serialization formats but once I got to these edges (and also this one), I realized that I needed to switch to JSON.net, which caused me to throw away lots of work, which is more challenging to use, and I believe slower as well.

~Seems like it's supported already in https://docs.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonnumberhandling?view=net-5.0~ Turns out this only supports them as strings, eg "Infinity", not Infinity.

lezzi commented 3 years ago

I want to share a couple of blockers we faced because of missing raw write support. We are developing a high-performance DynamoDb library and there are two use-cases that require writing a raw JSON:

  1. Sometimes database returns a JSON as a result that needs to be passed back to another request. This JSON is already validated and used as a string pagination token. We don't want to parse it and just need to send it back as a part of a JSON request. We had to develop a hack to flush the Utf8JsonWriter, then append raw bytes and advance the IBufferWriter<byte>.
  2. Database stores and returns numbers as strings. We can't always parse them back to numbers because sometimes we don't know the exact number type. When generating a JSON we need to write numbers as actual JSON numbers (not strings). The same flushing hack has a much bigger cost for this use case because JSON can contain hundreds of numbers.
phil-w commented 3 years ago

I have a very simple scenario where my C# server running the latest stuff acquires some JSON from another source. It's a standard published API, the contents of which my C# server doesn't know about and cares even less.

egorpavlikhin commented 3 years ago

Without this it's impossible to format numerical values in the output. There should either be a way to control how values are formatted or the ability to write unquoted numerical values to the stream.

sep2 commented 3 years ago

I got a scenario where I need to do a request json like this:

{
    "signature": "TWkJfGOKExKoIOjQef6fv5LuP/mT/YwYAtuQOI09bv39YPU=", // hash of serialized form of body field
    "body": {
        "params1": 1234567890,
        "params2": "xxxyyyzzz"
    }
}

I need to pre-serialize the body field to calculate the signature field, and then when construct the real request json, I want to just throw the already-serialized body string in to this json without re-serialize it again.

Tornhoof commented 3 years ago

As linked by @layomia for the source-gen code having a Raw Write functionality could be useful to replace the JsonEncodedText members with ReadOnlySpan<byte> and preencoded utf8 member names. At the moment, Benchmarks show that JsonEncodedText is faster (10-15%) for writing known property names in json as there is a fastpath for it. For the ROS overload it needs to check the validity of the bytes and possibly escaping. Having a Raw API to bypass the encoding check the performance, should be similar and would allow the source-gen to use ROS.

```csharp public class JsonWriteBenchmark { private static readonly JsonEncodedText JetPropertyName = JsonEncodedText.Encode("message"); private static ReadOnlySpan ROSPropertyName => new byte[] {0x6D, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65}; [Benchmark] public void WriteReadOnlySpanPropertyName() { using var ms = new MemoryStream(); Utf8JsonWriter writer = new Utf8JsonWriter(ms); writer.WriteStartObject(); writer.WriteString(ROSPropertyName, "Hello World"); writer.WriteEndObject(); writer.Flush(); } [Benchmark] public void WriteJsonEncodedTextPropertyName() { using var ms = new MemoryStream(); Utf8JsonWriter writer = new Utf8JsonWriter(ms); writer.WriteStartObject(); writer.WriteString(JetPropertyName, "Hello World"); writer.WriteEndObject(); writer.Flush(); } } ```
stevejgordon commented 3 years ago

@layomia I have started to run into this myself working on vNext of Elasticsearch.NET. Currently, we use Utf8Json and I'm replacing it with STJ.

Generally, we serialize a request directly and fully control that type. However, we have more complex cases, such as when indexing a document to Elasticsearch. In that case, we defer how the request type is serialized to the type itself. This is because we don't want to serialise our request object, but the user-provided TDocument property it holds. In other requests, the object being serialized may be a different property.

A second requirement is that we allow consumers to provide their own serializer which can handle serializing their types. This may/may not differ from our default. To support this, we have an abstraction over serialization. As such, the abstraction has no notion of Utf8JsonWriter. Due to this design, we need to use the implementation of this abstraction when serialising the TDocument.

I have a custom JsonConverterFactory and JsonConverter which handles this.

public class ProxyRequestConverterFactory : JsonConverterFactory
{
    private readonly IConnectionSettingsValues _settings;

    public ProxyRequestConverterFactory(IConnectionSettingsValues settings) => _settings = settings;

    public override bool CanConvert(Type typeToConvert) => typeToConvert.IsGenericType
        && typeToConvert.GetInterfaces().Any(x => x.UnderlyingSystemType == typeof(IProxyRequest));

    public override JsonConverter? CreateConverter(Type typeToConvert, JsonSerializerOptions options)
    {
        var elementType = typeToConvert.GetGenericArguments()[0];

        var att = typeToConvert.GetCustomAttribute<ConvertAsAttribute>();

        var converter = (JsonConverter)Activator.CreateInstance(
            typeof(ProxyRequestConverter<>).MakeGenericType(att?.ConvertType.MakeGenericType(elementType) ?? elementType),
            BindingFlags.Instance | BindingFlags.Public,
            args: new object[] { _settings },
            binder: null,
            culture: null)!;

        return converter;
    }
}

public class ProxyRequestConverter<TRequest> : JsonConverter<TRequest>
{
    private readonly IConnectionSettingsValues _settings;

    public ProxyRequestConverter(IConnectionSettingsValues settings) => _settings = settings;

    public override TRequest? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => 
        throw new NotImplementedException();

    public override void Write(Utf8JsonWriter writer, TRequest value, JsonSerializerOptions options)
    {
        if (value is IProxyRequest proxyRequest) proxyRequest.WriteJson(writer, _settings.SourceSerializer);
    }
}

An (abbreviated) example of a request using this mechanism:

public partial class IndexRequest<TDocument> : IProxyRequest
{
    public TDocument Document { get; set; }

    void IProxyRequest.WriteJson(Utf8JsonWriter writer, ITransportSerializer sourceSerializer)
    {        
        using var ms = new MemoryStream();
        sourceSerializer.Serialize(Document, ms);
        ms.Position = 0;
        using var document = JsonDocument.Parse(ms);
        document.RootElement.WriteTo(writer);
    }
}

This uses a variation on the workaround you described above. My concern though is efficiency when TDocument is a large object. Ideally, we'd skip the JsonDocument.Parse() call and write our raw JSON from the MemoryStream into the writer.

I am still prototyping this and a few other designs which avoid this but have their own downsides, so I may yet land on a different solution. I wanted to share the scenario in case it helps shape the future of STJ. Happy to expand on what I've shown here.

layomia commented 3 years ago

I've hit another use case of needing a raw value. I need to interoperate exactly with the output of OCaml's yojson. Yojson incorrectly outputs Infinity, -Infinity, and NaN. Though they violate the spec, I still need to read them and write them.

@pbiggar we are likely to perform validation on the raw payload based on the configured JsonWriterOptions.SkipValidation value. When that property is false, we'd perform a pass on the raw payload to make sure it is structurally valid, using Utf8JsonReader. The reader will reject unquoted text literals like Infinity, -Infinity, undefined, NaN, etc. (null is valid), and we don't have plans to extend it . To write values like these, you would need to configure the writer to skip validation.

If you do manage to write these unquoted literals, how would you read them back given that Utf8JsonReader will throw an exception if it encounters them? Just wanted to call out a potential limitation given your statement above:

Though they violate the spec, I still need to read them and write them.

pbiggar commented 3 years ago

If you do manage to write these unquoted literals, how would you read them back given that Utf8JsonReader will throw an exception if it encounters them? Just wanted to call out a potential limitation given your statement above:

You might be right. I was able to make this work with JSON.net, and I think perhaps in the future the solution is to move away from these invalid formats.

layomia commented 3 years ago

I updated the description above with an API proposal for this feature. @Tornhoof I opened https://github.com/dotnet/runtime/issues/54005 to address writing raw property names.

olivier-spinelli commented 3 years ago

Any chance to "open" a little bit the Utf8JsonReader/Writer API? I described my issue here: https://github.com/dotnet/runtime/issues/30194#issuecomment-858736293 (Wondering if I open an issue... If you can tell me what you think, it'll be cool :))

layomia commented 3 years ago

@olivier-spinelli per https://github.com/dotnet/runtime/issues/30194#issuecomment-858776492 please feel free to open a new issue addressing your read requirements.

olivier-spinelli commented 3 years ago

Thanks! The new issue is #54016.

stevejgordon commented 3 years ago

@layomia Apologies if I've misread this on my phone but I'm not sure this would fully support my scenario above? Could it be adapted that way? In that case I want to write complete JSON, not just a property value. Should I perhaps open a specific issue to discuss?

layomia commented 3 years ago

@stevejgordon I'll circle back with more detail, but yes you'd be able to write complete JSON like top-level JSON objects, objects within JSON arrays, etc., just like the existing Utf8JsonWriter.WriteXValue methods.

layomia commented 3 years ago

Update: I made the following changes to the proposal in the description above https://github.com/dotnet/runtime/issues/1784#issue-550483170

olivier-spinelli commented 3 years ago

A small remark on the API proposal about the nullable default parameter:

public void WriteRawValue(string json, JsonWriteRawOptions? options = null);

The JsonWriteRawOptions.Default exists (and is the default value 0). defaulting the parameter to null here seems somehow redundant and/or may be ambiguous (what's the final choice for the null?) and is, for sure, 5 bytes (+padding) on the stack instead of 4...

May be the following could be more explicit?

public void WriteRawValue( string json, JsonWriteRawOptions options = JsonWriteRawOptions.Default );
layomia commented 3 years ago

@olivier-spinelli I tried to explain this in the description:

The options parameter on the new Utf8JsonWriter.WriteRawValue method is nullable for forward compatibility - that is, if we wish to have an option on JsonWriterOptions to configure what the write raw settings for any WriteRawValue method call should be (as shown in the "Potential future API additions" section). Options passed directly to the method would win over the global option. The parameter is nullable so that we can distinguish between users passing default(JsonWriteRawOptions) (and meaning it), and users passing nothing (i.e. null, meaning the global option should apply).

Re:

what's the final choice for the null?

With the proposed API, passing a null value means we should use default(JsonWriteRawOptions).

Now, say the API we ship now is:

public sealed partial class Utf8JsonWriter
{
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}

And we want to add API like this in this future:

public struct JsonWriterOptions
{
    // Existing
    // public JavaScriptEncoder? Encoder { readonly get; set; }
    // public bool Indented { get; set; }
    // public bool SkipValidation { get; set; }

    // Specify write-raw options to use for all WriteRawValue calls on the writer instance
    public JsonWriteRawOptions WriteRawOptions { get; set; }
}

What would the writer behavior in this scenario be?

JsonWriterOptions options = new() { WriteRawOptions = JsonWriterOptions.SkipValidation };
using Utf8JsonWriter writer = new(ms, options);

// Option a: Writer uses `JsonWriterOptions.SkipValidation` as specified globally
// Option b: Writer uses `JsonWriterOptions.Default` as specified in method call
writer.WriteRawValue(blob); // Same thing as writer.WriteRawValue(blob, JsonWriterOptions.Default)

If we don't think we'd ever want a "global" WriteRawOptions option to influence all WriteRawValue calls, so that users don't have to manually pass them each time, then my above question wouldn't apply and we can simpy go with public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }.

olivier-spinelli commented 3 years ago

Ouch! I missed your remark, sorry! But... this may be a sign that the "Least Surprise Principle" has somehow been violated here...

If the "global" exists, wouldn't be clearer to use overloads rather than default (of default) values. Something like:

public sealed partial class Utf8JsonWriter
{
    /// <summary>
    ///  Writes the value directly according to the specified options.
    /// </summary>
    public void WriteRawValue( ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options ) { }

    /// <summary>
    ///  Writes the value directly using this writer's <see cref="JsonWriterOptions.WriteRawOptions" />.
    /// </summary>
    public void WriteRawValue( ReadOnlySpan<byte> utf8Json ) { }
}

But if you don't include it now, these overloads will be really strange :). IMHO whether the global option exists should be settled right now...

Now, my very personal thought about this is that this "global" should not be here, here's why: We're dealing here with a low-level API. That is meant to be used to handle specific cases. In these kind of scenario, there's a big probability that other (very specific) options are useful (like for instance: AllowNumberInfinity, NumberInfinityClampedToMax, UseByteArrayForGuid, etc.): the "global" Utf8JsonWriter options will not "be enough", these specific options will "naturally" be available in the calling context.

layomia commented 3 years ago

@olivier-spinelli yeah I agree that "global" configuration is not needed for this low-level API. Your API sketch does seem like a better approach if we wanted one. I'll revert to simply the following and not propose a global setting now or in the future:

public sealed partial class Utf8JsonWriter
{
    public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}
GrabYourPitchforks commented 3 years ago

Mentioned in internal chat: if the scenario is that you want to write a [potentially untrusted] payload as an object enveloped by some outer wrapper, make sure you're also checking that incoming data is well-formed UTF-8 and doesn't have malformed "\uXXXX" sequences or encoded standalone surrogates. This isn't needed for "put these bytes on the wire as-is" scenarios, which should only ever be run over trusted inputs.

terrajobst commented 3 years ago

Video

  • We should rename skipValidation to skipInputValidation to make it clear that this about catching issues with untrusted user input, rather than structural coding issues from the side of the caller.
  • When skipInputValidation is set to false we should enforce that no comments are present (or alternatively we strip them).
  • If we need to add options, we can either add an overload with an enum, a struct, or just more Booleans
  • The name RawValue is fine, but if we ever add the overload that takes a property name the convention would be to call it WriteRaw which is a bit odd
namespace System.Text.Json
{
    public sealed partial class Utf8JsonWriter
    {
        public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
        public void WriteRawValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
        public void WriteRawValue(string json, bool skipInputValidation = false);
    }
}
olivier-spinelli commented 3 years ago

Calling WriteRawValue is "raw writing". What about:

namespace System.Text.Json
{
    public sealed partial class Utf8JsonWriter
    {
        public void RawWriteValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
        public void RawWriteValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
        public void RawWriteValue(string json, bool skipInputValidation = false);

        public void RawWrite(JsonEncodedText propertyName, string json, bool skipInputValidation = false);
        public void RawWrite(string propertyName, ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
    }
}

Using Raw prefix rather than suffix may be simpler.