dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.23k stars 4.72k forks source link

API Proposal: Unsigned overloads for SequenceReader #30580

Open khellang opened 5 years ago

khellang commented 5 years ago

What's the reason unsigned types were left out of SequenceReader<T>? CLS compliance? I couldn't find any details on it.

It would be really nice if it mirrored the API surface of BinaryPrimitives (or Utf8Parser for that matter).

I propose the following:

namespace System.Buffers
{
    public static partial class SequenceReaderExtensions
    {
        public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out short value);
        public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out short value);

        public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out int value);
        public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out int value);

        public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out long value);
        public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out long value);

+       [CLSCompliant(false)]
+       public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out ushort value);
+       [CLSCompliant(false)]
+       public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out ushort value);
+
+       [CLSCompliant(false)]
+       public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out uint value);
+       [CLSCompliant(false)]
+       public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out uint value);
+
+       [CLSCompliant(false)]
+       public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out ulong value);
+       [CLSCompliant(false)]
+       public static bool TryReadBigEndian(ref this SequenceReader<byte> reader, out ulong value);
    }
}

It would also be nice if it had non-Try* alternatives, but that could be left for a separate proposal.

// @JeremyKuhne

JeremyKuhne commented 5 years ago

I didn't add them to reduce bloat. I figured you can just cast as there is no validation here other than you have the right amount of bytes to fill the type.

ahsonkhan commented 5 years ago

I didn't add them to reduce bloat. I figured you can just cast as there is no validation here other than you have the right amount of bytes to fill the type.

Except trying to read a uint wouldn't fit into an int, so the caller would have to up-cast to long. Similarly, if the SequenceReader contained a ulong, none of the existing overloads would work.

It would be really nice if it mirrored the API surface of BinaryPrimitives (or Utf8Parser for that matter).

I think its a good idea to add them, but just for context, @khellang do you have a scenario or real code snippet where you ran into this limitation being a concern that you had to workaround? That would help motivate adding the APIs (more than just the fact that we don't have parity).

khellang commented 5 years ago

do you have a scenario or real code snippet where you ran into this limitation being a concern that you had to workaround?

Nothing that I couldn't work around, but as you already mentioned:

trying to read a uint wouldn't fit into an int, so the caller would have to up-cast to long

The Try* pattern also adds to the clunkyness, where you have to declare (and name) a variable of the "wrong" type first, only to cast it later. You can't just cast "on the way out", like you could've done with a non-Try* API.

It's even more troublesome if you want to read a ulong over long.MaxValue; I don't really see how you'd do that using these methods (as the TryRead<T> method isn't exposed)?

When using these APIs, you typically code against some specification of data structures. It would be nice if the code would reflect the specification as close as possible, without having to do so much casting back and forth:

var reader = new SequenceReader<byte>(buffer);

var structure = new MyStructure
{
    Data1 = reader.ReadUInt32(),
    Data2 = reader.ReadUInt16(),
    Data3 = reader.ReadUInt16(),
    Data4 = reader.ReadUInt64()
}

I can, of course, add all of these as extension methods myself, but I think the framework could benefit from having them around in-box.

JeremyKuhne commented 5 years ago

Except trying to read a uint wouldn't fit into an int

I don't understand what you're trying to say. This is just byte blitting and hard casts. This isn't parsing.

khellang commented 5 years ago

Are you saying you could read ulong.MaxValue into a long using public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, out long value); and then cast it to ulong? 🤔

JeremyKuhne commented 5 years ago

Are you saying you could read ulong.MaxValue into a long using public static bool TryReadLittleEndian(ref this SequenceReader reader, out long value); and then cast it to ulong? 🤔

Yes, casts are unchecked by default at runtime. Both of these are 0xFFFFFFFF:

            int integer = -1;
            uint unsignedInteger = (uint)integer;

I'm not saying we can't add new overloads, I just want to make sure we're clear on justifications before we consider it. Data loss shouldn't be a problem here.

ahsonkhan commented 5 years ago

Yes, casts are unchecked by default at runtime. Both of these are 0xFFFFFFFF:

Data loss shouldn't be a problem here.

That's a good point. This works as expected.

[Fact]
public void ReadUInt64OverInt64()
{
    ulong value = (ulong)long.MaxValue + long.MaxValue;
    byte[] input = BitConverter.GetBytes(value);

    ReadOnlySequence<byte> bytes = new ReadOnlySequence<byte>(input);

    var reader = new SequenceReader<byte>(bytes);

    Assert.True(reader.TryReadLittleEndian(out long actual));
    Assert.Equal(value, (ulong)actual); // 18446744073709551614
}
MithrilMan commented 4 years ago

@JeremyKuhne Let me chime in just to say that for clarity unsigned overloads would be very welcome; maybe they would bloat a bit the library code but will help having lot less of that bloating in our codebase with unsolecited casts. (probably at networking level would have had more sense to have just unsigned if a decision had to be made to what trim out Last consideration: as you said, most of the time int/long etc... are just a way to speicfy how many bytes to fetch in a single call; giving the fact that SequenceReader are generic (even if for network purpose most of the time will be byte), why then not having a way to specify how many instances of T get in a single call, and return them in an array T[]?

So basically what about having a SequenceReader<T>.TryRead(int count, out T[] values) ? maybe with an argument like bool reverse to handle endianness)

khellang commented 4 years ago

I think that proposal is covered by https://github.com/dotnet/corefx/issues/40870? 🤔