dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.12k stars 4.7k forks source link

[API Proposal]: SequenceReader missing endian-aware read methods for C# built-in ValueTypes #95320

Open bevanweiss opened 10 months ago

bevanweiss commented 10 months ago

Background and motivation:

As the SequenceReader is a component quite commonly used in the performance optimized System.IO.Pipelines API, it would be nice for it to have a full complement of low level Read methods which are endian-aware (such that higher level code appears clean and consistent in accessing the lower level memory sequences presented by the System.IO.Pipelines APIs and others). There is support for some of the C# built-in types, but many are currently lacking.

API Proposal:

The addition of the following methods:

namespace System.Buffers;

public static partial class SequenceReaderExtensions
{
...
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out ushort value;
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out uint value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out ulong value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out ulong value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out nint value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out nuint value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.Int128 value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.UInt128 value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.Half value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out single value);
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out double value);
...
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out ushort value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out uint value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out ulong value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out nint value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out nuint value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.Int128 value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.UInt128 value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out System.Half value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out single value);
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out double value);
...
}

I also believe that consideration should be given to supporting the decimal C# datatype as well.

namespace System.Buffers;

public static partial class SequenceReaderExtensions
{
+        //NOTE: Throws NotImplemented on LittleEndian architecture
+        public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out decimal value);

+        //NOTE: Throws NotImplemented on BigEndian architecture
+        public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out decimal value);
}

The reason I believe this should be considered is not strictly for the Endian handling aspect, but more for the Sequence handling aspect. As the decimal is a native C# type, when using the SequenceReader API it doesn't feel appropriate that an API-user would need to concern themselves over re-assembling this native C# type themselves from potentially split sequences.

With the decimal being constructed of a 96bit integer, and a 32 bit flag, I'm unsure how it would strictly be handled in terms of endian swapping. I would however suggest that the API should be included in the SequenceReader, for completeness, but where the opposing read (BigEndian on little endian architecture and vice versa) would throw a NotImplemented exception. The current implementation could then only handle the endianness of the architecture until BinaryPrimatives supports this handling also.

API Usage:

using System.Buffers;

namespace My.Magic.Bus;
public struct MyCustomStruct
{
    public uint CustomerIndex;
    public decimal CustomersBigMoney;
}

public static class MyCustomStructSequenceReaderExtensions
{
    public static bool TryReadBigEndian(this ref System.Buffers.SequenceReader<byte> reader, out MyCustomStruct value)
    {
        MyCustomStruct temp;
        bool success = true;
        success &= reader.TryReadBigEndian(out temp.CustomerIndex);
        success &= reader.TryReadBigEndian(out temp.CustomersBigMoney);
        value = temp;
        return success;
    }

    public static bool TryReadLittleEndian(this ref System.Buffers.SequenceReader<byte> reader, out MyCustomStruct value)
    {
        MyCustomStruct temp;
        bool success = true;
        success &= reader.TryReadLittleEndian(out temp.CustomerIndex);
        success &= reader.TryReadLittleEndian(out temp.CustomersBigMoney);
        value = temp;
        return success;
    }
}
ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
It appears that BinaryPrimatives now contains Endian Aware reads / writes for Single / Double types (as well as unsigned types) https://github.com/dotnet/runtime/pull/6864/files It would be ideal if the StreamReader type supported similar. https://github.com/dotnet/runtime/blob/8557ef23f8b8d5b1c599e901d53ced04ee9a4d7c/src/libraries/System.Memory/ref/System.Memory.cs#L114-L119
Author: bevanweiss
Assignees: -
Labels: `area-System.IO`, `untriaged`
Milestone: -
ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
It appears that BinaryPrimatives now contains Endian Aware reads / writes for Single / Double types (as well as unsigned types) https://github.com/dotnet/runtime/pull/6864/files It would be ideal if the SequenceReader type supported similar. https://github.com/dotnet/runtime/blob/8557ef23f8b8d5b1c599e901d53ced04ee9a4d7c/src/libraries/System.Memory/ref/System.Memory.cs#L114-L119
Author: bevanweiss
Assignees: -
Labels: `area-System.IO`, `area-System.Memory`, `untriaged`
Milestone: -
ghost commented 10 months ago

This issue has been marked needs-author-action and may be missing some important information.

tannergooding commented 10 months ago

Can you please update the proposal to follow the API review template? Our general API review process is detailed here: https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md and we need, at minimum, a clear section detailing the signatures of the APIs being requested (such as in https://github.com/dotnet/runtime/issues/93046).

It would be ideal if you could ensure the same types supported by BinaryPrimitives are being requested to be supported by SequenceReader so we can review them all at once. What looks to be missing is:

MichalPetryka commented 10 months ago

What looks to be missing is

What about decimal?

tannergooding commented 10 months ago

There is no decimal API in BinaryPrimitives and I don't expect we have any plans of adding one either. Decimal is not a simple primitive with a trivially defined layout. It is a complex struct explicitly composed of several fields.

huoyaoyuan commented 10 months ago

In the era with generic math, do we still need to declare such API exhaustively for each type?

bevanweiss commented 10 months ago

In the era with generic math, do we still need to declare such API exhaustively for each type?

I think 'yes'. If you've got an example of (performant) code that would work 'generically' for all the types shown above, let us know.

Challenges of generics here

  1. No generic support for size constraints (you can't have something compile-type select a target method from sizeof(..))
  2. Even if you could, not all types even of the same size are the same internal format (i.e. decimal vs UInt128 / Int128)
huoyaoyuan commented 10 months ago
  1. No generic support for size constraints (you can't have something compile-type select a target method from sizeof(..))

You can just use a bunch of if and else for value types. Generic specialization will take care of it.

  1. Even if you could, not all types even of the same size are the same internal format (i.e. decimal vs UInt128 / Int128)

They are the same, when following definition of BinaryPrimitives.

When all the data types are just treated as a byte sequence, the API can be implemented very efficiently:

  1. Choose the integer type with same size, and use Unsafe.BitCast to convert back and forth.
  2. Delegate to IBinaryInteger.Read{Big|Little}Endian on the integer type.

The problem is that there's no common interface to express all fixed-length scalar number types.

tannergooding commented 10 months ago

Generic specialization will take care of it.

This is not a guarantee. It doesn't work with Mono today, doesn't work with Unity, doesn't work for reference types, can be disabled for some scenarios, etc.

Choose the integer type with same size, and use Unsafe.BitCast to convert back and forth.

We do not want to encourage developers to make use of Unsafe.* to achieve "basic functionality".

Delegate to IBinaryInteger.Read{Big|Little}Endian on the integer type.

Endianness reversal isn't one of the things we added general support for. It was discussed, and ultimately rejected, because it often cannot make sense for more complex types.

The closest we have is the ability to read/write values as a two's complement representation, which only applies to IBinaryInteger<T> and isn't necessarily the same as it doesn't provide guarantees about fitting a particular format. It could be the shortest two's complement representation, it could be a fixed length, it could some arbitrary other length as long as it's two's complement.

The problem is that there's no common interface to express all fixed-length scalar number types.

This was also intentional, there is too big of a split between the different scenarios users may want. They may only want language primitives, they may want only runtime primitives, they may want only ABI primitives, they may want any fixed-length type, etc.

Such interfaces would've largely just been marker interfaces and would have had concerns around versioning over time and the implications of expanding the set in them, given how users might try to use them.


If SequenceReader hadn't already exposed these APIs, I would have just said that users should use BinaryPrimitives to do all endianness related conversions. However, since we did expose them (partially due to historical layering considerations), it is reasonable to give users the "full" experience here and get the full set of common overloads.