dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.18k stars 4.72k forks source link

Increase serialization max size or make it user driven for SystemTextJson #61089

Open BRLN1 opened 2 years ago

BRLN1 commented 2 years ago

Background and motivation

I'm trying to serialize a big pdf of around 200MB, while trying to do so I'm getting following error:

---> System.ArgumentException: The JSON value of length 304063897 is too large and not supported.\r\n
   at System.Text.Json.ThrowHelper.ThrowArgumentException_ValueTooLarge(Int32 tokenLength)\r\n
   at System.Text.Json.Utf8JsonWriter.WriteStringValue(ReadOnlySpan`1 value)\r\n
   at System.Text.Json.Serialization.Converters.StringConverter.Write(Utf8JsonWriter writer, String value, JsonSerializerOptions options)\r\n
   at System.Text.Json.Serialization.JsonConverter`1.TryWrite(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)\r\n
   at System.Text.Json.Serialization.JsonConverter`1.WriteCore(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)\r\n
   at System.Text.Json.Serialization.JsonConverter`1.WriteCoreAsObject(Utf8JsonWriter writer, Object value, JsonSerializerOptions options, WriteStack& state)\r\n
   at System.Text.Json.JsonSerializer.WriteCore[TValue](JsonConverter jsonConverter, Utf8JsonWriter writer, TValue& value, JsonSerializerOptions options, WriteStack& state)\r\n
   at System.Text.Json.JsonSerializer.WriteAsyncCore[TValue](Stream utf8Json, TValue value, Type inputType, JsonSerializerOptions options, CancellationToken cancellationToken)\r\n
   at System.Net.Http.Json.JsonContent.SerializeToStreamAsyncCore(Stream targetStream, Boolean async, CancellationToken cancellationToken)\r\n
   at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask)\r\n
   at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken)\r\n
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n
   at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)\r\n

I found out that current size limitation for that is 125MB. That is definitely too little. I'd make sense to inscrease the cap, possibly to 2GB(like string that contains base64 of pdf or post request) or simply make that user defined.

If I'm missing any solution to that problem, I'd appreciate some tips, Thanks!

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-json See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation I'm trying to serialize a big pdf of around 200MB, while trying to do so I'm getting following error: ---> System.ArgumentException: The JSON value of length 304063897 is too large and not supported.\r\n at System.Text.Json.ThrowHelper.ThrowArgumentException_ValueTooLarge(Int32 tokenLength)\r\n at System.Text.Json.Utf8JsonWriter.WriteStringValue(ReadOnlySpan`1 value)\r\n at System.Text.Json.Serialization.Converters.StringConverter.Write(Utf8JsonWriter writer, String value, JsonSerializerOptions options)\r\n at System.Text.Json.Serialization.JsonConverter`1.TryWrite(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)\r\n at System.Text.Json.Serialization.JsonConverter`1.WriteCore(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)\r\n at System.Text.Json.Serialization.JsonConverter`1.WriteCoreAsObject(Utf8JsonWriter writer, Object value, JsonSerializerOptions options, WriteStack& state)\r\n at System.Text.Json.JsonSerializer.WriteCore[TValue](JsonConverter jsonConverter, Utf8JsonWriter writer, TValue& value, JsonSerializerOptions options, WriteStack& state)\r\n at System.Text.Json.JsonSerializer.WriteAsyncCore[TValue](Stream utf8Json, TValue value, Type inputType, JsonSerializerOptions options, CancellationToken cancellationToken)\r\n at System.Net.Http.Json.JsonContent.SerializeToStreamAsyncCore(Stream targetStream, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpContent.g__WaitAsync|56_0(ValueTask copyTask)\r\n at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)\r\n I found out that current size limitation for that is 125MB. That is definitely too little. I'd make sense to inscrease the cap, possibly to 2GB(like string that contains base64 of pdf or post request) or simply make that user defined. If I'm missing any solution to that problem, I'd appreciate some tips, Thanks! ### API Proposal ```C# namespace System.Collections.Generic { public class MyFancyCollection : IEnumerable { public void Fancy(T item); } } ``` ### API Usage ```C# // Fancy the value var c = new MyFancyCollection(); c.Fancy(42); // Getting the values out foreach (var v in c) Console.WriteLine(v); ``` ### Alternative Designs _No response_ ### Risks _No response_
Author: BRLN1
Assignees: -
Labels: `api-suggestion`, `area-System.Text.Json`, `untriaged`
Milestone: -
eiriktsarpalis commented 2 years ago

This appears to be by design: https://github.com/dotnet/runtime/blob/c690d2731db85933e8b4ab936fa71cd190471194/src/libraries/System.Text.Json/src/System/Text/Json/JsonConstants.cs#L71-L78 Basically the restriction is in place in order to guarantee that the escaped token does fit in a single span segment. That being said, the constant seems to be derived from the rather pessimistic assumption that every single character in the input string needs escaping, in the worst expansion possible.

I wonder if delaying the check until after we have detected that the string needs escaping might make sense here: https://github.com/dotnet/runtime/blob/4e7cf804c0c89d4316bbe3327ff4fe4441ee953d/src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.String.cs#L82-L96

cc @ahsonkhan @bartonjs @krwq

Tornhoof commented 2 years ago

Not a solution, but maybe a workaround, if you need to serialize the pdf for Elasticsearch ingest pipelines, you can also use CBOR to send the data, instead of json.