Open bill-poole opened 7 months ago
Tagging subscribers to this area: @dotnet/area-system-buffers See info in area-owners.md if you want to be subscribed.
Have you tried using one of the Utf8JsonReader.CopyString
methods? They don't decode to Base64 directly but you can use an intermediate pooled that can then be decoded as appropriate.
No, I didn't think of doing that. But, wouldn't that be at the cost of an additional copy from the Utf8JsonReader
to that buffer? i.e., instead of Base64 decoding directly from the underlying UTF-8 buffer to the given Span
If the value needs to be escaped, then perhaps the .NET library will copy twice anyway, but that would only be in the 0.0001% case that a Base64 value actually needs to be unescaped. Base64 values only use ASCII characters, so should never actually be escaped when serialized. But I recognize that the JSON could in theory represent some or all of the Base64 characters with Unicode escaped characters. So we're really talking about an extreme edge case here.
The vastly more common scenario is that we have Base64 text inside the UTF-8 JSON text that does not need to be escaped, which we want to decode to a user-provided Span<byte>
buffer in a single step. The purpose of the proposed API is to decode Base64-encoded property values efficiently. It is inefficient to heap allocate the output buffer, but it is arguably even more inefficient to copy the Base64-encoded string to a buffer, and then decode that to yet another buffer.
Apologies if I've misunderstood something.
What we do internally for cases like that is only call CopyString
when the string is found to be escaped or uses ReadOnlySequence:
This ensures that you don't make an intermediate copy for the 99% case, ensuring correctness for the remaining 1% and never allocating unpooled buffers.
Okay, good point. So then it is possible to replicate what the System.Text.Json library does with same/similar performance, but there's still a quite bit of tricky code we need to duplicate that already exists inside the System.Text.Json library. An extension method that might do the trick is below. This proposal is then about obviating the need for every user library that needs this to implement the extension method below.
internal static bool TryGetBytesFromBase64(this ref Utf8JsonReader reader, Span<byte> span, out int bytesWritten)
{
byte[]? encodedArray = null;
scoped ReadOnlySpan<byte> encodedSpan;
if (reader.ValueIsEscaped || reader.HasValueSequence)
{
var encodedLength = reader.HasValueSequence ? (int)reader.ValueSequence.Length : reader.ValueSpan.Length;
encodedSpan = encodedLength <= StackallocByteThreshold
? stackalloc byte[StackallocByteThreshold]
: (encodedArray = ArrayPool<byte>.Shared.Rent(encodedLength));
encodedSpan = encodedArray.AsSpan()[..reader.CopyString(encodedArray)];
}
else
{
encodedSpan = reader.ValueSpan;
}
var status = Base64.DecodeFromUtf8(encodedSpan, span, out var bytesConsumed, out bytesWritten);
if (encodedArray is not null)
{
ArrayPool<byte>.Shared.Return(encodedArray);
}
Debug.Assert(bytesConsumed == encodedSpan.Length);
return status == OperationStatus.Done;
}
This extension method is quite tricky, so it would be useful if it the functionality were to be exposed directly from the Utf8JsonReader
struct.
We might consider it in the future, but it's unlikely we would prioritize it given that there is a workaround.
Background and motivation
Reading a Base64-encoded string value from a
Utf8JsonReader
currently always allocates. However, in high performance scenarios, we want to be able to possibly stackalloc or the output buffer or rent it from the array pool. AUtf8JsonReader.TryGetBytesFromBase64(Span<byte> span)
overload would allow the caller to create the buffer in any way.API Proposal
API Usage
Alternative Designs
It's currently possible to do this if the value is not escaped by writing code that uses
System.Buffers.Text.Base64
, but the logic to do this is already inside the System.Text.Json library and just needs to be exposed.Risks
No response