dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.71k forks source link

Add Span<T> Binary Reader/Writer APIs #23601

Closed KrzysztofCwalina closed 4 years ago

KrzysztofCwalina commented 7 years ago

The API allows for reading and writing binary representation of primitve types (bit blitting) from/to spans of bytes. The API is used by SignalR.

A prototype of the API is available in corfxlab: https://github.com/dotnet/corefxlab/tree/master/src/System.Binary

Part of dotnet/corefx#24174

Usage is quite simple:

var span = stackalloc byte[4]; 
span.Write(Int32.MaxValue);
var value = span.Read<int>();

The LittleEndian and BigEndian versions assume/specify specific endianness. The Write/Read versions assume current machine endianness. Try versions return false if the buffer is too small to read/write the specified data type.

API Design:

// System.Memory.dll
namespace System.Buffers.Binary {
    public static class BinaryPrimitives {
        public static short ReadInt16BigEndian(ReadOnlySpan<byte> buffer);
        public static short ReadInt16LittleEndian(ReadOnlySpan<byte> buffer);
        public static int ReadInt32BigEndian(ReadOnlySpan<byte> buffer);
        public static int ReadInt32LittleEndian(ReadOnlySpan<byte> buffer);
        public static long ReadInt64BigEndian(ReadOnlySpan<byte> buffer);
        public static long ReadInt64LittleEndian(ReadOnlySpan<byte> buffer);
        public static T ReadMachineEndian<T>(ReadOnlySpan<byte> buffer) where T : struct;
        public static ushort ReadUInt16BigEndian(ReadOnlySpan<byte> buffer);
        public static ushort ReadUInt16LittleEndian(ReadOnlySpan<byte> buffer);
        public static uint ReadUInt32BigEndian(ReadOnlySpan<byte> buffer);
        public static uint ReadUInt32LittleEndian(ReadOnlySpan<byte> buffer);
        public static ulong ReadUInt64BigEndian(ReadOnlySpan<byte> buffer);
        public static ulong ReadUInt64LittleEndian(ReadOnlySpan<byte> buffer);
        public static byte ReverseEndianness(byte value);
        public static short ReverseEndianness(short value);
        public static int ReverseEndianness(int value);
        public static long ReverseEndianness(long value);
        public static sbyte ReverseEndianness(sbyte value);
        public static ushort ReverseEndianness(ushort value);
        public static uint ReverseEndianness(uint value);
        public static ulong ReverseEndianness(ulong value);
        public static bool TryReadInt16BigEndian(ReadOnlySpan<byte> buffer, out short value);
        public static bool TryReadInt16LittleEndian(ReadOnlySpan<byte> buffer, out short value);
        public static bool TryReadInt32BigEndian(ReadOnlySpan<byte> buffer, out int value);
        public static bool TryReadInt32LittleEndian(ReadOnlySpan<byte> buffer, out int value);
        public static bool TryReadInt64BigEndian(ReadOnlySpan<byte> buffer, out long value);
        public static bool TryReadInt64LittleEndian(ReadOnlySpan<byte> buffer, out long value);
        public static bool TryReadMachineEndian<T>(ReadOnlySpan<byte> buffer, out T value) where T : struct;
        public static bool TryReadUInt16BigEndian(ReadOnlySpan<byte> buffer, out ushort value);
        public static bool TryReadUInt16LittleEndian(ReadOnlySpan<byte> buffer, out ushort value);
        public static bool TryReadUInt32BigEndian(ReadOnlySpan<byte> buffer, out uint value);
        public static bool TryReadUInt32LittleEndian(ReadOnlySpan<byte> buffer, out uint value);
        public static bool TryReadUInt64BigEndian(ReadOnlySpan<byte> buffer, out ulong value);
        public static bool TryReadUInt64LittleEndian(ReadOnlySpan<byte> buffer, out ulong value);
        public static bool TryWriteInt16BigEndian(Span<byte> buffer, short value);
        public static bool TryWriteInt16LittleEndian(Span<byte> buffer, short value);
        public static bool TryWriteInt32BigEndian(Span<byte> buffer, int value);
        public static bool TryWriteInt32LittleEndian(Span<byte> buffer, int value);
        public static bool TryWriteInt64BigEndian(Span<byte> buffer, long value);
        public static bool TryWriteInt64LittleEndian(Span<byte> buffer, long value);
        public static bool TryWriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct;
        public static bool TryWriteUInt16BigEndian(Span<byte> buffer, ushort value);
        public static bool TryWriteUInt16LittleEndian(Span<byte> buffer, ushort value);
        public static bool TryWriteUInt32BigEndian(Span<byte> buffer, uint value);
        public static bool TryWriteUInt32LittleEndian(Span<byte> buffer, uint value);
        public static bool TryWriteUInt64BigEndian(Span<byte> buffer, ulong value);
        public static bool TryWriteUInt64LittleEndian(Span<byte> buffer, ulong value);
        public static void WriteInt16BigEndian(Span<byte> buffer, short value);
        public static void WriteInt16LittleEndian(Span<byte> buffer, short value);
        public static void WriteInt32BigEndian(Span<byte> buffer, int value);
        public static void WriteInt32LittleEndian(Span<byte> buffer, int value);
        public static void WriteInt64BigEndian(Span<byte> buffer, long value);
        public static void WriteInt64LittleEndian(Span<byte> buffer, long value);
        public static void WriteMachineEndian<T>(Span<byte> buffer, ref T value) where T : struct;
        public static void WriteUInt16BigEndian(Span<byte> buffer, ushort value);
        public static void WriteUInt16LittleEndian(Span<byte> buffer, ushort value);
        public static void WriteUInt32BigEndian(Span<byte> buffer, uint value);
        public static void WriteUInt32LittleEndian(Span<byte> buffer, uint value);
        public static void WriteUInt64BigEndian(Span<byte> buffer, ulong value);
        public static void WriteUInt64LittleEndian(Span<byte> buffer, ulong value);
    }
}
Original proposals (click to expand) ```c# // System.Memory.dll namespace System.Buffers.Binary { public static class BinaryPrimitives { public static short ReadInt16BigEndian(ReadOnlySpan buffer); public static short ReadInt16LittleEndian(ReadOnlySpan buffer); public static int ReadInt32BigEndian(ReadOnlySpan buffer); public static int ReadInt32LittleEndian(ReadOnlySpan buffer); public static long ReadInt64BigEndian(ReadOnlySpan buffer); public static long ReadInt64LittleEndian(ReadOnlySpan buffer); public static T ReadMachineEndian(ReadOnlySpan buffer) where T : struct; public static ushort ReadUInt16BigEndian(ReadOnlySpan buffer); public static ushort ReadUInt16LittleEndian(ReadOnlySpan buffer); public static uint ReadUInt32BigEndian(ReadOnlySpan buffer); public static uint ReadUInt32LittleEndian(ReadOnlySpan buffer); public static ulong ReadUInt64BigEndian(ReadOnlySpan buffer); public static ulong ReadUInt64LittleEndian(ReadOnlySpan buffer); public static byte ReverseEndianness(byte value); public static short ReverseEndianness(short value); public static int ReverseEndianness(int value); public static long ReverseEndianness(long value); public static sbyte ReverseEndianness(sbyte value); public static ushort ReverseEndianness(ushort value); public static uint ReverseEndianness(uint value); public static ulong ReverseEndianness(ulong value); public static bool TryReadInt16BigEndian(ReadOnlySpan buffer, out short value); public static bool TryReadInt16LittleEndian(ReadOnlySpan buffer, out short value); public static bool TryReadInt32BigEndian(ReadOnlySpan buffer, out int value); public static bool TryReadInt32LittleEndian(ReadOnlySpan buffer, out int value); public static bool TryReadInt64BigEndian(ReadOnlySpan buffer, out long value); public static bool TryReadInt64LittleEndian(ReadOnlySpan buffer, out long value); public static bool TryReadMachineEndian(ReadOnlySpan buffer, out T value) where T : struct; public static bool TryReadUInt16BigEndian(ReadOnlySpan buffer, out ushort value); public static bool TryReadUInt16LittleEndian(ReadOnlySpan buffer, out ushort value); public static bool TryReadUInt32BigEndian(ReadOnlySpan buffer, out uint value); public static bool TryReadUInt32LittleEndian(ReadOnlySpan buffer, out uint value); public static bool TryReadUInt64BigEndian(ReadOnlySpan buffer, out ulong value); public static bool TryReadUInt64LittleEndian(ReadOnlySpan buffer, out ulong value); public static bool TryWriteInt16BigEndian(Span buffer, short value); public static bool TryWriteInt16LittleEndian(Span buffer, short value); public static bool TryWriteInt32BigEndian(Span buffer, int value); public static bool TryWriteInt32LittleEndian(Span buffer, int value); public static bool TryWriteInt64BigEndian(Span buffer, long value); public static bool TryWriteInt64LittleEndian(Span buffer, long value); public static bool TryWriteMachineEndian(Span buffer, ref T value) where T : struct; public static bool TryWriteUInt16BigEndian(Span buffer, ushort value); public static bool TryWriteUInt16LittleEndian(Span buffer, ushort value); public static bool TryWriteUInt32BigEndian(Span buffer, uint value); public static bool TryWriteUInt32LittleEndian(Span buffer, uint value); public static bool TryWriteUInt64BigEndian(Span buffer, ulong value); public static bool TryWriteUInt64LittleEndian(Span buffer, ulong value); public static void WriteInt16BigEndian(Span buffer, short value); public static void WriteInt16LittleEndian(Span buffer, short value); public static void WriteInt32BigEndian(Span buffer, int value); public static void WriteInt32LittleEndian(Span buffer, int value); public static void WriteInt64BigEndian(Span buffer, long value); public static void WriteInt64LittleEndian(Span buffer, long value); public static void WriteMachineEndian(Span buffer, ref T value) where T : struct; public static void WriteUInt16BigEndian(Span buffer, ushort value); public static void WriteUInt16LittleEndian(Span buffer, ushort value); public static void WriteUInt32BigEndian(Span buffer, uint value); public static void WriteUInt32LittleEndian(Span buffer, uint value); public static void WriteUInt64BigEndian(Span buffer, ulong value); public static void WriteUInt64LittleEndian(Span buffer, ulong value); } } ```
KrzysztofCwalina commented 7 years ago

Great, then we will use the Unsafe APIs to implement reader/writer.

KrzysztofCwalina commented 7 years ago

I updated the APIs (at the top of the issue) after incorporating feedback from. We are still discussing if we rename Read to Get and Write to Put/Set.

tmds commented 7 years ago

SpanReaderand SpanWritercould have an endianness constructor argument. I don't know about the performance implications, but it would shorten the method names and make it easy to changes endianness .

Drawaes commented 7 years ago

@tmds Take a look at dotnet/corefx#24180

KrzysztofCwalina commented 7 years ago

Updated the assembly where we will add the APIs (at the top of this issue). The current POR is to use System.Memory.dll.

We decided we cannot use System.Buffers.dll as it was in-boxed in .NET Core 2.0

karelz commented 7 years ago

FYI: The API review discussion was recorded - see https://youtu.be/m4BUM3nJZRw?t=38 (70 min duration) For detailed notes see System.Binary notes PR.

tmds commented 7 years ago

Floating point numbers are not included here. Can they be included? FYI, Wikipedia on endianness floating point:

Floating point[edit] Although the ubiquitous x86 processors of today use little-endian storage for all types of data (integer, floating point, BCD), there are a number of hardware architectures where floating-point numbers are represented in big-endian form while integers are represented in little-endian form.[17] There are ARM processors that have half little-endian, half big-endian floating-point representation for double-precision numbers: both 32-bit words are stored in little-endian like integer registers, but the most significant one first. Because there have been many floating-point formats with no "network" standard representation for them, the XDR standard uses big-endian IEEE 754 as its representation. It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness.[18] Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type. (Small embedded systems using special floating-point formats may be another matter however.)

Perhaps the method names can be shortened by using "LE"/"BE" instead of "LittleEndian/BigEndian"?

From what I want to do with this, I'd like a higher-level SpanReader/SpanWriter api as requested by @svick and @Drawaes (https://github.com/dotnet/corefx/issues/24180). nit: perhaps change the title of this issue from Reader/Writer to Read/Write.

I wonder: can't we get rid of the overloads via generic ReadBE<T>/WriteBE<T>/ReadLE<T>/WriteLE<T>?

Drawaes commented 7 years ago

As discussed in the API review there is no current way of limiting T to a primative hence no generic.

As for floating point the text above says why I don't think there should be a "standard" floating point op because there really is no standard to speak of. I would say if there are floating point ops they should be off on their own or if they truly are just byte flipped you could just read a unit or using and blit it in a simple method ;)

tmds commented 7 years ago

As discussed in the API review there is no current way of limiting T to a primative hence no generic.

I wondered if the reason was perf. The methods could throw for non-primitive types. The generic method has the advantage of being generically callable. e.g. consider how this affects a BigEndianWriter implementation.

As for floating point the text above says why I don't think there should be a "standard" floating point op because there really is no standard to speak of. I would say if there are floating point ops they should be off on their own or if they truly are just byte flipped you could just read a unit or using and blit it in a simple method ;)

Indeed. Lack of a standard may be a reason to not to include these. Or you can go by one may in practice safely assume that the endianness is the same for floating-point numbers as for integers.

tmds commented 7 years ago

So I read the whole thread...

I think the explicitness of the type being written/read is a good reason to go for the non-generic version (even for the 'machine' endianness). buffer.WriteBigEndian(i); vs buffer.WriteBigEndian((byte)i); vs buffer.WriteUInt8BE((byte)i)

Also, my use-case is for SpanReader/SpanWriter. It seems that that is the common use-cases. And as shown by @stephentoub a single value can easily read/written using such api's too. This raises the question whether there should be a public API for dealing with a single value.

Drawaes commented 7 years ago

@ahsonkhan

Do you have a rough ETA for this being available for CoreFX, there are some places I would like to use it inside SslStream in the next refactor around the handshake and framing.

Cheers Tim

stephentoub commented 7 years ago

@Drawaes, which functionality do you need? I'd have thought BitConverter would be sufficient. Or you need everything read/written big endian?

Don't count on being able to use these APIs.

Drawaes commented 7 years ago

I don't need it per se. But the frames do a bigendian read on the size field. Just would have been nice as I prototype this all moving to span / memory to use the official method. It's certainly not going to hold me up. Just wondering.

karelz commented 7 years ago

@Drawaes we have active discussions on which APIs should be out-of-box vs. in the platform. I believe we are leaning towards having this one out of box, which means nothing in platform can depend on it. At least not now. Please don't take dependency on it, unless you really have to.

Drawaes commented 7 years ago

Cool good to know. I don't really need it was just going to use it if it was the new one true way ;)

KrzysztofCwalina commented 7 years ago

This is now done. dotnet/corefx#24400. Thanks for all the feedback!

karelz commented 7 years ago

@KrzysztofCwalina I think you meant "now done", right?

KrzysztofCwalina commented 7 years ago

Corrected :-)