Closed nahk-ivanov closed 3 years ago
Seems like it is "by design"
Yep. In the initial design discussions of Utf8JsonWriter
, we had considered adding WriteRaw
APIs, but decided against it: https://github.com/dotnet/corefx/issues/33552
there are still scenarios where for performance we want to write raw JSON
We could consider revisiting that decision, if there is evidence of a significant performance benefit in doing so. Do you have some benchmarks for your scenario that would potentially show a win or is there a general concern for too much unnecessary work being done?
In this case, it would be nice to have access to the output stream directly in the custom
JsonConverter
One problem with trying to expose the destination is that Utf8JsonWriter
could have been constructed to write to different output destinations: Stream
(like FileStream
) or IBufferWriter<byte>
(like ArrayBufferWriter<byte>
).
Is your use case specific to uses within JsonConverter
? If you are passing the output stream to the JsonSerializer
, do you still have access to it?
One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form.
Generally speaking, how large are these nested JSON payloads that are stored in their original form, on average (looking for order of magnitude such as 1 KB, 1 MB, 1 GB)?
It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL).
What about validation though? Do they care about making sure the JSON is valid though before just writing the string directly calling WriteRawValue
or is it assumed the JSON to already be validated elsewhere (like in your case where the JSON retrieved was previously written to storage by the JsonSerializer
)?
The best I could find is to use
JsonDocument.ParseValue
+document.RootElement.GetRawText()
when reading it and parse it again into aJsonDocument
when writing to just callWriteTo
.Am I missing some APIs?
No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter
. However, one thing you could do to avoid calling GetRawText
and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement
from the JsonDocument
and call WriteTo
on it. Is that feasible for your scenario? It still won't beat the performance of writing the raw bytes directly, but it should improve significantly.
Not knowing the specifics of your scenario, I explored the scenario a bit with some arbitrary JSON data and these are the results:
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-alpha1-015914
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
Job-DHBJYG : .NET Core 5.0.0 (CoreCLR 5.0.19.56303, CoreFX 5.0.19.56306), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000
Method | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
RoundTripWithString | 95.44 us | 1.467 us | 1.372 us | 95.01 us | 93.23 us | 98.31 us | 1.00 | 0.00 | 6.2256 | 0.8545 | - | 25.59 KB |
RoundTripWithJsonElement | 67.49 us | 0.543 us | 0.454 us | 67.47 us | 66.65 us | 68.14 us | 0.71 | 0.01 | 4.2725 | 0.3662 | - | 17.95 KB |
RoundTripRaw | 44.97 us | 0.688 us | 0.610 us | 44.83 us | 44.31 us | 46.29 us | 0.47 | 0.01 | 7.3853 | 0.0610 | - | 30.37 KB |
RoundTripWithBytes | 92.23 us | 1.715 us | 1.432 us | 91.68 us | 90.31 us | 94.93 us | 0.97 | 0.02 | 3.1738 | 0.2441 | - | 13.02 KB |
RoundTripWithBytesRaw | 44.31 us | 0.871 us | 0.932 us | 44.08 us | 43.32 us | 46.20 us | 0.47 | 0.01 | 3.1128 | 0.2441 | - | 12.95 KB |
Thanks for taking a stub at this and doing some benchmarking!
At this point it is a general concern for too much unnecessary work being done - this is part of an ASP.NET Core web app hosted in Azure App Service and I'd like to host on as little nodes as possible using smallest SKU possible, so that we scale horizontally, rather than vertically when needed. This looked like a low-hanging fruit to optimize, if all necessary APIs would exist, of course.
Payloads are not big in size (limited by Azure Table Storage entity size anyway), ranging from 1 to 32 KB. But it is an OLTP workload (think online gaming), so we get thousands requests per second and this is not a heavy load yet, of course.
Since serialization/deserialization happens within ASP.NET infrastructure, the easiest would be to hook through JsonConverter
. I'll need to check ASP.NET Core APIs to see if I can get a hold of the serializer or request body stream directly to override standard deserialization behavior. I think it might be possible, if I just claim that my action expects a raw binary input, rather than a JSON payload and use JSON reader directly myself. It would be more convenient to have it extract some of the properties automatically though, and only ask me what to with the subset of properties - that's why I was leaning towards JsonConverter
approach.
Validation-wise - it is there, but we only need to validate properties at the root level, which we do. Think the following json:
{
"username": "foo",
"accessToken": "bar",
"payload": { "complex": "object" }
}
As long as username
and accessToken
are valid we don't care if user wants to "shoot himself in a foot" with bogus payload
. There are, of course, scenarios where even the payload needs to be validated (e.g. shared with other users), but in this particular case it doesn't matter, as long as overall request size is valid. So what I used to do is map top-level properties and use custom converter on the payload
, so that it is preserved in raw form within the application code.
I thought about using JsonElement
for the payload
property, yeah. This was possible even with JSON.NET as they had similar type, but I don't like serialization-specific types to "leak" into the data model. I'm not happy with using custom attributes, but I can live with it (even though I may not be sleeping well). If I have to use serialization framework-specific types for the properties in the model classes - then it will be even harder to switch from one framework to another.
With attributes I can add custom wrapper types and define converters for them externally to the model, but with public JsonElement Payload { get; set; }
it's hard to do anything about it.
Based on your benchmarks I'll probably switch to JsonElement
(even though it affects my sleep), but it would be nice if we could easily get performance of RoundTripWithBytesRaw
using simple public APIs.
From @declspec in https://github.com/dotnet/runtime/issues/32849:
I would like to propose an addition to the
System.Text.Json.Utf8JsonWriter
API which would allow writing existing UTF8-encoded JSON fragments into the current JSON document.Something like the following:
public sealed class Utf8JsonWriter { public void WriteUtf8JsonValue(ReadOnlySpan<byte> utf8json); public void WriteUtf8Json(string propertyName, ReadOnlySpan<byte> utf8json); // ... and the rest of the standard overloads for the different types of `propertyName` }
An example of usage would be:
var utf8json = Encoding.UTF8.GetBytes("{ \"foo\": 42, \"bar\": \"baz\" }").AsSpan(); using (var ms = new MemoryStream()) { using (var writer = new Utf8JsonWriter(ms)) { writer.WriteStartObject(); writer.WriteUtf8Json("fragment", utf8json); writer.WriteEndObject(); } Debug.WriteLine(Encoding.UTF8.GetString(ms.ToArray())); // Expected output: // { "fragment": { "foo": 42, "bar": "baz" } } }
I believe this would be useful when you are trying to augment or wrap existing JSON payloads (i.e. from external services) in a parent document without needing to deserialize and reserialize them.
I have read through all the documentation I can find and as far as I can see there doesn't currently seem to be any way of doing what I'm proposing. If there is already a better way of doing this please let me know.
What is the workaround?
What is the workaround?
@mikemcdougall Going back to good ol' Newtonsoft.Json and its JsonWriter.WriteRawValue.
Work-around as described above:
No, you aren't missing anything. What you found here is the only way you could do this within the JsonConverter. However, one thing you could do to avoid calling GetRawText and an extra re-parse is instead of storing the JSON as string to write later, store the JsonElement from the JsonDocument and call WriteTo on it.
string rawJson = ...; // Raw value can also be `byte`s
using (JsonDocument document = JsonDocument.Parse(rawJson))
{
document.RootElement.WriteTo(writer);
}
We're in a similar situation, we heavily use PG json(b) columns for user defined metadata, our backend reads intentially know nothing about their contents, we'd ideally not pay the price of parsing and serializing something we only pass through.
Api proposal seems solid, good to see it's not based on String.
For large payloads -- LOH allocs are easier to avoid by deserializing, which we're trying to prevent here -- a ReadOnlySequence
overload could be of use too.
We're looking at introducing ReadOnlySequence
in Npgsql (PG driver) too, meaning we could remove all LOH allocs in an end to end scenario involving potentially large column values.
A bit late to the party here, but this issue is what stops me from converting to system.text.json from newtonsoft.json.
My issue is related to the GeoJSON format, which is a subset of JSON, used to describe geospatial features.
Support for this format and mapping it to C# classes is provided by the NetTopologySuite package. The reading and writing of GeoJson to corresponding C# classes is complex, and not something I want to re-implement. Neither is it something I wish to be included in the standard library.
The problem is that a GeoJSON geometry is a complex structure.
In order to handle this with Newtonsoft I've done the following: https://gist.github.com/atlefren/4ec30d46af1dde48c4a461abc5b0b3c5
However, I cannot find a way to handle this in system.text.json. As far as I'm concerned, validation and such is no issue here, as the NTS reader/writer is responsible for that part.
EDIT: I See that NTS has made this possible on their side, as documented here: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/wiki/Using-NTS.IO.GeoJSON4STJ-with-ASP.NET-Core-MVC#basic-usage
@atlefren did you take a look at the workaround in https://github.com/dotnet/runtime/issues/1784#issuecomment-608331125? Does it help?
Sorry for the noob question but @StephenBonikowsky moving this issue to .NET 6 Committed does it mean that it is planned for .Net 6? (The milestone on the card is "Future", so I doubt it.)
(If it is, it's cool! I'd very like to see WriteRaw
available method(s).)
@olivier-spinelli thank you for pointing out that discrepancy. @ericstj is this still planned for .NET 6?
This is part of the https://github.com/dotnet/runtime/issues/43620 epic -> https://github.com/dotnet/runtime/issues/45190 user story. We expect to get to this in 6.0. I've adjusted the milestone.
I'm having a similar problem using System.Text.Json. I want to write the double 81.0
. The docs say:
Writes the Double using the default StandardFormat (that is, 'G') on .NET Core 3.0 or later versions.
As a result, 81.0
will only print as 81
, and I do not appear to have any ability to make the text 81.0
appear in my JSON. This means that I can't tell the difference between an int or a float in my JSON.
I would like the ability to control the output of a double, and one way to solve that would be to allow me to write a raw value into the json.
I've hit another use case of needing a raw value. I need to interoperate exactly with the output of OCaml's yojson. Yojson incorrectly outputs Infinity
, -Infinity
, and NaN
. Though they violate the spec, I still need to read them and write them.
I had a full implementation of my serialization formats but once I got to these edges (and also this one), I realized that I needed to switch to JSON.net, which caused me to throw away lots of work, which is more challenging to use, and I believe slower as well.
~Seems like it's supported already in https://docs.microsoft.com/en-us/dotnet/api/system.text.json.serialization.jsonnumberhandling?view=net-5.0~
Turns out this only supports them as strings, eg "Infinity"
, not Infinity
.
I want to share a couple of blockers we faced because of missing raw write support. We are developing a high-performance DynamoDb library and there are two use-cases that require writing a raw JSON:
Utf8JsonWriter
, then append raw bytes and advance the IBufferWriter<byte>
.I have a very simple scenario where my C# server running the latest stuff acquires some JSON from another source. It's a standard published API, the contents of which my C# server doesn't know about and cares even less.
Without this it's impossible to format numerical values in the output. There should either be a way to control how values are formatted or the ability to write unquoted numerical values to the stream.
I got a scenario where I need to do a request json like this:
{
"signature": "TWkJfGOKExKoIOjQef6fv5LuP/mT/YwYAtuQOI09bv39YPU=", // hash of serialized form of body field
"body": {
"params1": 1234567890,
"params2": "xxxyyyzzz"
}
}
I need to pre-serialize the body field to calculate the signature field, and then when construct the real request json, I want to just throw the already-serialized body string in to this json without re-serialize it again.
As linked by @layomia for the source-gen code having a Raw Write functionality could be useful to replace the JsonEncodedText
members with ReadOnlySpan<byte>
and preencoded utf8 member names. At the moment, Benchmarks show that JsonEncodedText
is faster (10-15%) for writing known property names in json as there is a fastpath for it. For the ROS overload it needs to check the validity of the bytes and possibly escaping. Having a Raw API to bypass the encoding check the performance, should be similar and would allow the source-gen to use ROS.
@layomia I have started to run into this myself working on vNext of Elasticsearch.NET. Currently, we use Utf8Json and I'm replacing it with STJ.
Generally, we serialize a request directly and fully control that type. However, we have more complex cases, such as when indexing a document to Elasticsearch. In that case, we defer how the request type is serialized to the type itself. This is because we don't want to serialise our request object, but the user-provided TDocument
property it holds. In other requests, the object being serialized may be a different property.
A second requirement is that we allow consumers to provide their own serializer which can handle serializing their types. This may/may not differ from our default. To support this, we have an abstraction over serialization. As such, the abstraction has no notion of Utf8JsonWriter. Due to this design, we need to use the implementation of this abstraction when serialising the TDocument
.
I have a custom JsonConverterFactory
and JsonConverter
which handles this.
public class ProxyRequestConverterFactory : JsonConverterFactory
{
private readonly IConnectionSettingsValues _settings;
public ProxyRequestConverterFactory(IConnectionSettingsValues settings) => _settings = settings;
public override bool CanConvert(Type typeToConvert) => typeToConvert.IsGenericType
&& typeToConvert.GetInterfaces().Any(x => x.UnderlyingSystemType == typeof(IProxyRequest));
public override JsonConverter? CreateConverter(Type typeToConvert, JsonSerializerOptions options)
{
var elementType = typeToConvert.GetGenericArguments()[0];
var att = typeToConvert.GetCustomAttribute<ConvertAsAttribute>();
var converter = (JsonConverter)Activator.CreateInstance(
typeof(ProxyRequestConverter<>).MakeGenericType(att?.ConvertType.MakeGenericType(elementType) ?? elementType),
BindingFlags.Instance | BindingFlags.Public,
args: new object[] { _settings },
binder: null,
culture: null)!;
return converter;
}
}
public class ProxyRequestConverter<TRequest> : JsonConverter<TRequest>
{
private readonly IConnectionSettingsValues _settings;
public ProxyRequestConverter(IConnectionSettingsValues settings) => _settings = settings;
public override TRequest? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
throw new NotImplementedException();
public override void Write(Utf8JsonWriter writer, TRequest value, JsonSerializerOptions options)
{
if (value is IProxyRequest proxyRequest) proxyRequest.WriteJson(writer, _settings.SourceSerializer);
}
}
An (abbreviated) example of a request using this mechanism:
public partial class IndexRequest<TDocument> : IProxyRequest
{
public TDocument Document { get; set; }
void IProxyRequest.WriteJson(Utf8JsonWriter writer, ITransportSerializer sourceSerializer)
{
using var ms = new MemoryStream();
sourceSerializer.Serialize(Document, ms);
ms.Position = 0;
using var document = JsonDocument.Parse(ms);
document.RootElement.WriteTo(writer);
}
}
This uses a variation on the workaround you described above. My concern though is efficiency when TDocument
is a large object. Ideally, we'd skip the JsonDocument.Parse()
call and write our raw JSON from the MemoryStream
into the writer.
I am still prototyping this and a few other designs which avoid this but have their own downsides, so I may yet land on a different solution. I wanted to share the scenario in case it helps shape the future of STJ. Happy to expand on what I've shown here.
I've hit another use case of needing a raw value. I need to interoperate exactly with the output of OCaml's yojson. Yojson incorrectly outputs Infinity, -Infinity, and NaN. Though they violate the spec, I still need to read them and write them.
@pbiggar we are likely to perform validation on the raw payload based on the configured JsonWriterOptions.SkipValidation
value. When that property is false
, we'd perform a pass on the raw payload to make sure it is structurally valid, using Utf8JsonReader
. The reader will reject unquoted text literals like Infinity
, -Infinity
, undefined
, NaN
, etc. (null
is valid), and we don't have plans to extend it . To write values like these, you would need to configure the writer to skip validation.
If you do manage to write these unquoted literals, how would you read them back given that Utf8JsonReader
will throw an exception if it encounters them? Just wanted to call out a potential limitation given your statement above:
Though they violate the spec, I still need to read them and write them.
If you do manage to write these unquoted literals, how would you read them back given that Utf8JsonReader will throw an exception if it encounters them? Just wanted to call out a potential limitation given your statement above:
You might be right. I was able to make this work with JSON.net, and I think perhaps in the future the solution is to move away from these invalid formats.
I updated the description above with an API proposal for this feature. @Tornhoof I opened https://github.com/dotnet/runtime/issues/54005 to address writing raw property names.
Any chance to "open" a little bit the Utf8JsonReader/Writer API? I described my issue here: https://github.com/dotnet/runtime/issues/30194#issuecomment-858736293 (Wondering if I open an issue... If you can tell me what you think, it'll be cool :))
@olivier-spinelli per https://github.com/dotnet/runtime/issues/30194#issuecomment-858776492 please feel free to open a new issue addressing your read requirements.
Thanks! The new issue is #54016.
@layomia Apologies if I've misread this on my phone but I'm not sure this would fully support my scenario above? Could it be adapted that way? In that case I want to write complete JSON, not just a property value. Should I perhaps open a specific issue to discuss?
@stevejgordon I'll circle back with more detail, but yes you'd be able to write complete JSON like top-level JSON objects, objects within JSON arrays, etc., just like the existing Utf8JsonWriter.WriteXValue
methods.
Update: I made the following changes to the proposal in the description above https://github.com/dotnet/runtime/issues/1784#issue-550483170
A small remark on the API proposal about the nullable default parameter:
public void WriteRawValue(string json, JsonWriteRawOptions? options = null);
The JsonWriteRawOptions.Default
exists (and is the default value 0). defaulting the parameter to null here seems somehow redundant and/or may be ambiguous (what's the final choice for the null?) and is, for sure, 5 bytes (+padding) on the stack instead of 4...
May be the following could be more explicit?
public void WriteRawValue( string json, JsonWriteRawOptions options = JsonWriteRawOptions.Default );
@olivier-spinelli I tried to explain this in the description:
The options parameter on the new Utf8JsonWriter.WriteRawValue method is nullable for forward compatibility - that is, if we wish to have an option on JsonWriterOptions to configure what the write raw settings for any WriteRawValue method call should be (as shown in the "Potential future API additions" section). Options passed directly to the method would win over the global option. The parameter is nullable so that we can distinguish between users passing default(JsonWriteRawOptions) (and meaning it), and users passing nothing (i.e. null, meaning the global option should apply).
Re:
what's the final choice for the null?
With the proposed API, passing a null
value means we should use default(JsonWriteRawOptions)
.
Now, say the API we ship now is:
public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}
And we want to add API like this in this future:
public struct JsonWriterOptions
{
// Existing
// public JavaScriptEncoder? Encoder { readonly get; set; }
// public bool Indented { get; set; }
// public bool SkipValidation { get; set; }
// Specify write-raw options to use for all WriteRawValue calls on the writer instance
public JsonWriteRawOptions WriteRawOptions { get; set; }
}
What would the writer behavior in this scenario be?
JsonWriterOptions options = new() { WriteRawOptions = JsonWriterOptions.SkipValidation };
using Utf8JsonWriter writer = new(ms, options);
// Option a: Writer uses `JsonWriterOptions.SkipValidation` as specified globally
// Option b: Writer uses `JsonWriterOptions.Default` as specified in method call
writer.WriteRawValue(blob); // Same thing as writer.WriteRawValue(blob, JsonWriterOptions.Default)
If we don't think we'd ever want a "global" WriteRawOptions
option to influence all WriteRawValue
calls, so that users don't have to manually pass them each time, then my above question wouldn't apply and we can simpy go with public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
.
Ouch! I missed your remark, sorry! But... this may be a sign that the "Least Surprise Principle" has somehow been violated here...
If the "global" exists, wouldn't be clearer to use overloads rather than default (of default) values. Something like:
public sealed partial class Utf8JsonWriter
{
/// <summary>
/// Writes the value directly according to the specified options.
/// </summary>
public void WriteRawValue( ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options ) { }
/// <summary>
/// Writes the value directly using this writer's <see cref="JsonWriterOptions.WriteRawOptions" />.
/// </summary>
public void WriteRawValue( ReadOnlySpan<byte> utf8Json ) { }
}
But if you don't include it now, these overloads will be really strange :). IMHO whether the global option exists should be settled right now...
Now, my very personal thought about this is that this "global" should not be here, here's why:
We're dealing here with a low-level API. That is meant to be used to handle specific cases. In these kind of scenario, there's a big probability that other (very specific) options are useful (like for instance: AllowNumberInfinity
, NumberInfinityClampedToMax
, UseByteArrayForGuid
, etc.): the "global" Utf8JsonWriter options will not "be enough", these specific options will "naturally" be available in the calling context.
@olivier-spinelli yeah I agree that "global" configuration is not needed for this low-level API. Your API sketch does seem like a better approach if we wanted one. I'll revert to simply the following and not propose a global setting now or in the future:
public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, JsonWriteRawOptions options = default) { }
}
Mentioned in internal chat: if the scenario is that you want to write a [potentially untrusted] payload as an object enveloped by some outer wrapper, make sure you're also checking that incoming data is well-formed UTF-8 and doesn't have malformed "\uXXXX"
sequences or encoded standalone surrogates. This isn't needed for "put these bytes on the wire as-is" scenarios, which should only ever be run over trusted inputs.
skipValidation
to skipInputValidation
to make it clear that this about catching issues with untrusted user input, rather than structural coding issues from the side of the caller.skipInputValidation
is set to false
we should enforce that no comments are present (or alternatively we strip them).RawValue
is fine, but if we ever add the overload that takes a property name the convention would be to call it WriteRaw
which is a bit oddnamespace System.Text.Json
{
public sealed partial class Utf8JsonWriter
{
public void WriteRawValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
public void WriteRawValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
public void WriteRawValue(string json, bool skipInputValidation = false);
}
}
Calling WriteRawValue
is "raw writing". What about:
namespace System.Text.Json
{
public sealed partial class Utf8JsonWriter
{
public void RawWriteValue(ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
public void RawWriteValue(ReadOnlySpan<char> json, bool skipInputValidation = false);
public void RawWriteValue(string json, bool skipInputValidation = false);
public void RawWrite(JsonEncodedText propertyName, string json, bool skipInputValidation = false);
public void RawWrite(string propertyName, ReadOnlySpan<byte> utf8Json, bool skipInputValidation = false);
}
}
Using Raw
prefix rather than suffix may be simpler.
Edited by @layomia
Original proposal by @nahk-ivanov (click to view)
According to the image in the [roadmap](https://github.com/dotnet/corefx/tree/master/src/System.Text.Json/roadmap) - ```Utf8JsonWriter``` would be the lowest level of API we are going to get, but it's missing ```WriteRawValue``` functionality from JSON.NET. Seems like it is "by design", as roadmap states > Does not allow writing invalid JSON according to the JSON RFC In this case, it would be nice to have access to the output stream directly in the custom ```JsonConverter```, as there are still scenarios where for performance we want to write raw JSON. It's quite often that application layer doesn't care about particular branches of the JSON document and just wants to pass them through to/from the data store (especially when using NoSQL). What we used to do with JSON.NET in such cases is maintain it as string property and define custom converter that would use ```WriteRawValue```, which took a ```string``` argument. One of the examples would be using Azure Table Storage, where we want to extract few top-level properties to be PartitionKey/RowKey and store other nested properties in their original JSON form. I don't see any reason why application would unnecessarily deserialize and serialize these parts of the tree - all we need is to just run forward-only reader to return the value of the given property in it's raw form and later write it as such. I looked at the current ```System.Text.Json``` APIs and can't find a good way of doing this. It seems like (for performance reasons) we would want to maintain raw data in a binary form now (compared to string previously), which is fine, but there is no way to read/write next value as bytes. The best I could find is to use ```JsonDocument.ParseValue``` + ```document.RootElement.GetRawText()``` when reading it and parse it again into a ```JsonDocument``` when writing to just call ```WriteTo```. Am I missing some APIs?Introduction
There are scenarios where customers want to integrate existing, "raw" JSON payloads when writing new JSON payloads with
Utf8JsonWriter
. There's no first-class API for providing this functionality today, but there's a workaround usingJsonDocument
:This implementation builds a metadata database for navigating a JSON document, which is unnecessary given the desired functionality and has a performance overhead.
The goal of this feature is to provide performant and safe APIs to write raw JSON values.
API Proposal
Alternate Design (click to view)
```cs namespace System.Text.Json { // Specifies processing options when writing raw JSON values. [Flags] public enum JsonWriteRawOptions { // Raw JSON values will be validated for structural correctness and encoded if required. Default = 0, // Whether to skip validation. Independent of JsonWriterOptions.SkipValidation. SkipValidation = 1, // Whether to skip encoding specified via JsonWriterOptions.Encoder. SkipEncoding = 2, } public sealed partial class Utf8JsonWriter { // Writes the span of bytes directly as JSON content, according to the specified options. public void WriteRawValue(ReadOnlySpanScenarios
The major considerations for this feature are:
The answers depend on the scenario. Is the raw JSON payload trusted or arbitrary? Is the enveloping of the raw payload done on behalf of users? This proposal seeks to make all of these post-processing options fully configurable by users, allowing them to tweak the behavior in a way that satisfies their performance, correctness, and security requirements.
In this proposal, raw JSON values will not be reformatted to align with whatever whitespace and indentation settings are specified on
JsonWriterOptions
, but left as-is. We can provide a future API to reformat raw JSON values. We'll also not encode values by default but can provide API to do it in the future. If users need all three processing options today, the workaround shown above works for that scenario.Let's go over two of the expected common scenarios, and what API would be configured to enable them.
I have a blob which I think represents JSON content and which I want to envelope, and I need to make sure the envelope & its inner contents remain well-formed
Consider the Azure SDK team which provides a service protocol where the payload content is JSON. Part of the JSON is internal protocol information, and a nested part of the JSON is user provided data. When serializing protocol information, Azure cares about the raw user JSON being structurally valid, and ultimately that the overall JSON payload is valid. They don't wish to alter the formatting of the raw JSON, or preemptively encode it. Given these considerations, they might write their serialization logic as follows:
The resulting JSON payload might look like this:
Notice that no
JsonWriteRawOptions
had to be specified.I have a deliberate sequence of bytes I want to write out on the wire, and I know what I'm doing
Consider a user who needs to format
double
values differently from theUtf8JsonWriter.WriteNumber[X]
methods. Those methods write the shortest possible JSON output required for round-tripping on deserialization. This means that afloat
ordouble
value that can be expressed simply as89
would not be written as89.0
. However, our user needs to send numeric JSON content to a service which treats values without decimal points as integers, and values with decimal points as doubles. This distinction is important to our user. TheWriteRawValue
APIs would allow our custom to format their numeric values as they wish, and write them directly to the destination buffer. In this scenario, their numeric values are trusted, and our user would not want the writer to perform any encoding or structural validation on their raw JSON payloads, to give the fastest possible perf. To satisfy this trusted scenario, our user might write serialization logic as follows:The resulting JSON payload might look like this:
Notes
WriteRawValue
APIs would work not just for writing raw values as JSON property values (i.e. as POCO properties and dictionary values), but also as root level JSON objects, arrays, and primitives; and as JSON array elements. This functions in the same way as the existingUtf8JsonWriter.WriteXValue
methods.What's next?