dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.67k forks source link

Provide a stateful reader writer of span's #23616

Open Drawaes opened 7 years ago

Drawaes commented 7 years ago

ref dotnet/runtime#23601

When using span to read/write a block of memory you often end up with the following pattern

var myspan = [someplace a span is allocated]

span.Write[TBD](someValue);
span = span.Slice(sizeof(typeof(someValue));
span.Write[TBD](someOtherValue);
span = span.Slice(.......

Now obviously that code is not 100% representative of real code but the overall pattern is. I have used span in a couple of real world apps and the pattern ends up being a mess. Write/Read then slice, Write/Read then slice again. It seems that as it is a very common pattern (from my use anyway) the abstraction that is often come up with is a struct wrapper that can "hide" that slicing away and instead "advance" the span on writes. Something like

public struct BigEndianWriter
{
    private Span<byte> _innerSpan;

    public BigEndianWriter(Span<byte> innerSpan) => _innerSpan;
}

public struct LittleEndianWriter
{
     private Span<byte> _innerSpan;

     public LittleEndianWriter(Span<byte> innerSpan) => _innerSpan;
}

public struct BigEndianReader
{
    private ReadOnlySpan<byte> _innerSpan;

    public BigEndianReader(ReadOnlySpan<byte> innerSpan) => _innerSpan;
}

public struct LittleEndianReader
{
     private ReadOnlySpan<byte> _innerSpan;

     public LittleEndianReader(ReadOnlySpan<byte> innerSpan) => _innerSpan;
}

The kinds of methods you would want to see on such a struct would be


//Readers only
public byte ReadByte();
public sbyte ReadByteS();
public short ReadInt16();
public ushort ReadUInt16();
public uint ReadUInt32();
public int ReadInt32();
public long ReadInt64();
public ulong ReadUInt64();

//Writers only
public void WriteByte(byte value);
public void WriteSByte(sbyte value);
public void WriteUInt16(ushort value);
public void WriteInt16(short value);
public void WriteInt32(int value);
public void WriteUInt32(uint value);
public void WriteInt64(long value);
public void WriteUInt64(ulong value);

//Common
public int Length => _innerSpan.Length;
public Span Span => _innerSpan;
public void Advance(int count);

If you want to "peek/poke" then you can just get out the span via reader.Span there is no need to provide them on the type.

[EDIT] Add C# syntax highlight by @karelz [EDIT] Updated to more concrete design

jnm2 commented 7 years ago

I would use a reader as well as a writer for sure.

Drawaes commented 7 years ago

It should also have "Peek" methods that don't advance

public int PeekBigEndianInt32();
ektrah commented 7 years ago

+1.

The biggest problem with this is that mutable structs are evil. It's quite likely that users will pass around a reader or writer struct by-value and be surprised that it doesn't advance, in particular since the existing BinaryReader/BinaryWriter types are classes. The obvious solution doesn't seem popular.

Drawaes commented 7 years ago

They are already at the boundary with stack only spans. Anyone writing and reading directly to spans I think can have an assumed level of knowledge.

Drawaes commented 6 years ago

/cc @davidfowl @benaadams @stephentoub This is more like what I would like to deal with Spans for reading/writing.

tmds commented 6 years ago

Consider using a constructor argument as endianness specifier: public SpanReader(Span<byte> span, bool isBigEndian). The endianness can now change at runtime. For example, reading endianness from a marker and then passing that to the reader constructor. I don't know if this impacts performance when the endianness is a compile-type constant compared to typed endianness.

Drawaes commented 6 years ago

it will make the struct bigger for a start as you need to store it and its a branch for each op, currently the struct will only be the size of a span, and as its basically a struct wrapper you can switch if you want by doing

var bigEndian = new BigEndianWriter(littleEndian.Span);
tmds commented 6 years ago

Adding two use-cases:

These use-cases cannot be handled with the proposed api without duplicating the code for big and little endian.

Drawaes commented 6 years ago

Stick an interface on it. Make your method generic to avoid the box. A branch per op is a killer

tmds commented 6 years ago

Stick an interface on it. Make your method generic to avoid the box. A branch per op is a killer

Can you stick an interface on a ref struct?

Drawaes commented 6 years ago

I don't see why not? But others smarter than I would have to answer. I can honestly say however unless the JIT can optimise out that branch per read/write it would make this struct useless for me...

tmds commented 6 years ago

I don't see why not? But others smarter than I would have to answer. I can honestly say however unless the JIT can optimise out that branch per read/write it would make this struct useless for me...

Maybe there should be more structs/methods/... I'm just bringing use-cases :)

According to: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md

A ref struct type may not be declared to implement any interface

tmds commented 6 years ago

Merging the Little and Big classes by using different methods. And adding an argument overload to Read/Write that accepts Endianness. Plus a generic read/write. Gives something like this:


enum Endianness
{
   Native,
   BigEndian,
   LittleEndian
}

ref struct SpanReader
{
   SpanReader(ReadOnlySpan<byte>);

   public void Advance(int count);

   ushort ReadUInt16(Endianness = Endianness.Native);
   ushort ReadUInt16LE();
   ushort ReadUInt16BE();

   T Read<T>() where T : struct;
}

ref struct SpanWriter
{
   SpanWriter(Span<byte>);

   public void Advance(int count);

   void WriteUInt16(ushort value, Endianness = Endianness.Native);
   void WriteUInt16LE(ushort value);
   void WriteUInt16BE(ushort value);

   void Write<T>(T) where T : struct;
}
tmds commented 6 years ago

Do SpanReader/Writer need to store a Span? Or would it be allowed to store a pointer? e.g.

    unsafe ref struct SpanReader
    {
        private void* _data;
        private int _length;

        public ReadOnlySpan<byte> Remaining => new ReadOnlySpan<byte>(_data, _length);

        public SpanReader(Span<byte> span)
        {
            _data = Unsafe.AsPointer(ref span.DangerousGetPinnableReference());
            _length = span.Length;
        }

        public T Read<T>() where T : struct
        {
            int size = Unsafe.SizeOf<T>();
            if (size > _length)
            {
                throw new ArgumentOutOfRangeException();
            }
            byte* data = (byte*)_data;
            _data = data + size;
            _length -= size;
            return Unsafe.ReadUnaligned<T>(data);
        }
    }

Would that be more efficient?

Drawaes commented 6 years ago

Not sure. I think if it's a lot faster then we should get whatever is slow fixed in the jit/runtime rather than optimising separately here...?

benaadams commented 6 years ago

Or would it be allowed to store a pointer?

Could create a GC hole if the span dropped out scope

The reader/writer (as specified above) should/could be close to a zero cost convenience wrapper

tmds commented 6 years ago

Could create a GC hole if the span dropped out scope

Can the Span drop out of scope before the Reader/Writer? Looking at the stack order, it should out-live the reader/writer.

benaadams commented 6 years ago

Create span, use in ctor, never reference it again?

tmds commented 6 years ago

Create span, use in ctor, never reference it again?

var span = new ReadOnlySpan<byte>(new byte[1024]);
var reader = new SpanReader(span);
var value = reader.ReadUInt16();

Ah, so the span may be out of scope on the last line, before the reader.

ektrah commented 6 years ago
_data = Unsafe.AsPointer(ref span.DangerousGetPinnableReference());

The ref returned by DangerousGetPinnableReference isn't pinned. The GC is free to move the array inside the span around in memory, which makes the pointer invalid even if the span does not go out of scope.

tmds commented 6 years ago

The ref returned by DangerousGetPinnableReference isn't pinned. The GC is free to move the array inside the span around in memory, which makes the pointer invalid even if the span does not go out of scope.

I was aiming to re-use the pinning of the Span. As long as the Span does not go out of scope, the array is pinned by the Span (right?). As pointed out by @benaadams the Span can go out of scope, then the pinning is lost. So the SpanReader/Writer are responsible for pinning themselves (by including a Span member or a Pinnable).

stephentoub commented 6 years ago

the array is pinned by the Span (right?)

No, Span does not pin.

ektrah commented 6 years ago

As long as the Span does not go out of scope, the array is pinned by the Span (right?).

The span does not pin the array. DangerousGetPinnableReference returns a pinnable reference, not a pinned reference.

tmds commented 6 years ago

@stephentoub @ektrah Aha! TIL: Span does not pin! So pinning happens when the fixed keyword is used?

stephentoub commented 6 years ago

So pinning happens when the fixed keyword is used?

Yes.

Drawaes commented 6 years ago

Hence the inner span stays ;)

roji commented 6 years ago

+1 on this, am doing work on spanification/pipelinization of Npgsql and the write/slice/write/slice is really not nice to work with.

DaZombieKiller commented 6 years ago

If anybody is interested, I just wrote up a quick implementation of a "SpanReader" based on this discussion:

https://gist.github.com/DaZombieKiller/e0a3f4b3de21d998797eed33cda0f709

Just note that it's not perfect and is subject to 3 AM programming, so feel free to berate me give me constructive criticism.

stephentoub commented 5 years ago

@JeremyKuhne, is there anything more we want to do here after your BufferReader/Writer work, or can this issue be closed?

JeremyKuhne commented 5 years ago

is there anything more we want to do here?

Well the writer isn't done yet. Additionally SequenceReader<T> only supports ReadOnlySequence<T>.

davidfowl commented 5 years ago

I think it might be worth looking at adding span support even with the additional overhead

xPaw commented 5 years ago

When working with binary data, I find BinaryReader to be very useful. It would be quite nice if BinaryReader could accept a Span, instead of a Stream.

steji113 commented 4 years ago

Is this something that will potentially be looked into for 5.0? I am looking for an efficient binary data reader that preferably operates over spans. I have a legacy binary protocol that is typically length prefixed messages that encodes various primitive types somewhat similarly to how MsgPack works. Our existing code uses BinaryReader but this does not have async support, nor is it the most efficient.

tmds commented 4 years ago

Well the writer isn't done yet. Additionally SequenceReader only supports ReadOnlySequence.

I think it might be worth looking at adding span support even with the additional overhead

So the reader API then is:

    public ref struct SequenceReader<T>
    {
        public SequenceReader(ReadOnlySpan<T> span)

Some Sequence oriented members may not be supported.

Maybe we should create a separate issue to split the reader and the writer API?

Our existing code uses BinaryReader but this does not have async support, nor is it the most efficient.

A Span reader won't have async support either because the ref struct can't live on the heap (which is necessary to capture async state).

shaggygi commented 4 years ago

Just curious. Is the plan for this issue to create a SequenceWriter<T> and if so, does it look like it will be added by .NET 5 release? Thx. (I'm guessing this is different as I came across 282).