dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.12k stars 4.7k forks source link

[API Proposal]: Expose MemoryStream.InternalReadSpan() as a public method #106524

Open crozone opened 1 month ago

crozone commented 1 month ago

Background and motivation

When reading from Streams, it's possible to create a more efficient specialized code-path in the case where the underlying Stream is a MemoryStream. This is done by accessing MemoryStream's underlying buffer directly, which avoids buffer copies into a destination buffer.

MemoryStream already contains an internal method, InternalReadSpan(), to facilitate this:

public class MemoryStream : Stream
{
    internal ReadOnlySpan<byte> InternalReadSpan(int count);
}

This returns a slice of the underlying MemoryStream buffer as a ROSpan and advances the stream by the appropriate amount. It is used internally by BinaryReader to perform efficient reads from MemoryStream without the unnecessary overhead of buffer copies.

By providing a public version of this method, third party code could make use of the same optimization that BinaryReader is currently afforded internally.

It is possible to implement a workaround as an extension method using MemoryStream's public methods:

public static ReadOnlySpan<byte> ReadSpanUnsafe(this MemoryStream memoryStream, int numBytes)
{
    if (memoryStream.TryGetBuffer(out var msBuffer))
    {
        Span<byte> buffer = msBuffer.AsSpan((int)memoryStream.Position, numBytes);
        memoryStream.Seek(numBytes, SeekOrigin.Current);
        return buffer;
    }
    else
    {
        throw InvalidOperationException();
    }
}

This is impractical, since it only works if the MemoryStream was constructed with publiclyVisible = true, which sets _exposable = true. The most commonly used constructors set _exposable = false by default, causing TryGetBuffer() to return false and defeating the technique. The publiclyVisible property is intended to shield the underlying buffer from arbitrary reads and writes. This is not an issue for an official ReadSpanUnsafe() implementation that can return an ROSpan which only exposes read-only access to the actually "read" portion of the underlying buffer.

API Proposal

namespace System.IO;

public class MemoryStream : Stream
{
    public ReadOnlySpan<byte> ReadSpanUnsafe(int count);
}

API Usage

ReadSpanUnsafe() would work naturally with any methods that parse data from Span.

void ParseSomeData(MemoryStream memoryStream) {
    // Read a hex string from a MemoryStream
    string hexString = Convert.ToHexString(memoryStream.ReadSpanUnsafe(8));

    // Read a U16 BE from MemoryStream
    ushort bigEndianU16Value = BinaryPrimitives.ReadInt16BigEndian(memoryStream.ReadSpanUnsafe(2));

    // Read Double from MemoryStream
    BinaryPrimitives.ReadDoubleBigEndian(memoryStream.ReadSpanUnsafe(8));
}

It could also be used to efficiently extend BinaryReader with Big Endian parsing extension methods:

public static class BinaryReaderBigEndianExtensions
{
    public static short ReadInt16BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt16BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(2) : binaryReader.ReadAsSpan(stackalloc byte[2]));
    public static ushort ReadUInt16BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt16BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(2) : binaryReader.ReadAsSpan(stackalloc byte[2]));
    public static int ReadInt32BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt32BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(4) : binaryReader.ReadBytes(4));
    public static uint ReadUInt32BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt32BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(4) : binaryReader.ReadAsSpan(stackalloc byte[4]));
    public static long ReadInt64BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadInt64BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(8) : binaryReader.ReadAsSpan(stackalloc byte[8]));
    public static ulong ReadUInt64BigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadUInt64BigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(8) : binaryReader.ReadAsSpan(stackalloc byte[8]));
    public static Half ReadHalfBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadHalfBigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(2) : binaryReader.ReadAsSpan(stackalloc byte[2]));
    public static float ReadSingleBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadSingleBigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(4) : binaryReader.ReadAsSpan(stackalloc byte[4]));
    public static double ReadDoubleBigEndian(this BinaryReader binaryReader) => BinaryPrimitives.ReadDoubleBigEndian(binaryReader.BaseStream is MemoryStream ms ? ms.ReadSpanUnsafe(8) : binaryReader.ReadAsSpan(stackalloc byte[8]));

    private static ReadOnlySpan<byte> ReadAsSpan(this BinaryReader binaryReader, Span<byte> buffer)
    {
        binaryReader.BaseStream.ReadExactly(buffer);
        return buffer;
    }
}

Alternative Designs

An alternative could be to allow the consumer to pass a delegate handler to the read method which provides the ROSpan buffer region as an argument, which constrains the lifetime of the ROSpan. The processed data could then be returned as an arbitrary TResult.

namespace System.IO;

public class MemoryStream : Stream
{
    public delegate TResult SpanHandler<out TResult>(ReadOnlySpan<byte> buffer);
    public static TResult ReadAndUse<TResult>(this BinaryReader binaryReader, int numBytes, SpanHandler<TResult> handler);
}

However this requires an additional delegate to be defined (since Func<ReadOnlySpan<byte>, TResult> is not legal), and has the added overhead of a delegate allocation.

Risks

The pitfall with the ReadSpanUnsafe() method is that the returned ROSpan's contents can become invalid if the memory in the underlying buffer is modified by a write to the stream. This will only happen if the MemoryStream position is moved backwards, and then a write is performed that overlaps with the portion of the buffer exposed by the ROSpan.

For this reason, the method name is appended with "Unsafe" to clearly indicate to the consumer that care should be taken when using the method. In practice, it would be unusual to hold onto the ROSpan from a previous read, while seeking and writing to the MemoryStream.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.