Open Drawaes opened 7 years ago
I would use a reader as well as a writer for sure.
It should also have "Peek" methods that don't advance
public int PeekBigEndianInt32();
+1.
The biggest problem with this is that mutable structs are evil. It's quite likely that users will pass around a reader or writer struct by-value and be surprised that it doesn't advance, in particular since the existing BinaryReader/BinaryWriter types are classes. The obvious solution doesn't seem popular.
They are already at the boundary with stack only spans. Anyone writing and reading directly to spans I think can have an assumed level of knowledge.
/cc @davidfowl @benaadams @stephentoub This is more like what I would like to deal with Spans for reading/writing.
Consider using a constructor argument as endianness specifier: public SpanReader(Span<byte> span, bool isBigEndian)
.
The endianness can now change at runtime. For example, reading endianness from a marker and then passing that to the reader constructor.
I don't know if this impacts performance when the endianness is a compile-type constant compared to typed endianness.
it will make the struct bigger for a start as you need to store it and its a branch for each op, currently the struct will only be the size of a span, and as its basically a struct wrapper you can switch if you want by doing
var bigEndian = new BigEndianWriter(littleEndian.Span);
Adding two use-cases:
These use-cases cannot be handled with the proposed api without duplicating the code for big and little endian.
Stick an interface on it. Make your method generic to avoid the box. A branch per op is a killer
Stick an interface on it. Make your method generic to avoid the box. A branch per op is a killer
Can you stick an interface on a ref struct?
I don't see why not? But others smarter than I would have to answer. I can honestly say however unless the JIT can optimise out that branch per read/write it would make this struct useless for me...
I don't see why not? But others smarter than I would have to answer. I can honestly say however unless the JIT can optimise out that branch per read/write it would make this struct useless for me...
Maybe there should be more structs/methods/... I'm just bringing use-cases :)
According to: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md
A ref struct type may not be declared to implement any interface
Merging the Little and Big classes by using different methods.
And adding an argument overload to Read/Write that accepts Endianness.
Plus a generic
enum Endianness
{
Native,
BigEndian,
LittleEndian
}
ref struct SpanReader
{
SpanReader(ReadOnlySpan<byte>);
public void Advance(int count);
ushort ReadUInt16(Endianness = Endianness.Native);
ushort ReadUInt16LE();
ushort ReadUInt16BE();
T Read<T>() where T : struct;
}
ref struct SpanWriter
{
SpanWriter(Span<byte>);
public void Advance(int count);
void WriteUInt16(ushort value, Endianness = Endianness.Native);
void WriteUInt16LE(ushort value);
void WriteUInt16BE(ushort value);
void Write<T>(T) where T : struct;
}
Do SpanReader/Writer need to store a Span? Or would it be allowed to store a pointer? e.g.
unsafe ref struct SpanReader
{
private void* _data;
private int _length;
public ReadOnlySpan<byte> Remaining => new ReadOnlySpan<byte>(_data, _length);
public SpanReader(Span<byte> span)
{
_data = Unsafe.AsPointer(ref span.DangerousGetPinnableReference());
_length = span.Length;
}
public T Read<T>() where T : struct
{
int size = Unsafe.SizeOf<T>();
if (size > _length)
{
throw new ArgumentOutOfRangeException();
}
byte* data = (byte*)_data;
_data = data + size;
_length -= size;
return Unsafe.ReadUnaligned<T>(data);
}
}
Would that be more efficient?
Not sure. I think if it's a lot faster then we should get whatever is slow fixed in the jit/runtime rather than optimising separately here...?
Or would it be allowed to store a pointer?
Could create a GC hole if the span dropped out scope
The reader/writer (as specified above) should/could be close to a zero cost convenience wrapper
Could create a GC hole if the span dropped out scope
Can the Span drop out of scope before the Reader/Writer? Looking at the stack order, it should out-live the reader/writer.
Create span, use in ctor, never reference it again?
Create span, use in ctor, never reference it again?
var span = new ReadOnlySpan<byte>(new byte[1024]); var reader = new SpanReader(span); var value = reader.ReadUInt16();
Ah, so the span may be out of scope on the last line, before the reader.
_data = Unsafe.AsPointer(ref span.DangerousGetPinnableReference());
The ref
returned by DangerousGetPinnableReference
isn't pinned. The GC is free to move the array inside the span around in memory, which makes the pointer invalid even if the span does not go out of scope.
The ref returned by DangerousGetPinnableReference isn't pinned. The GC is free to move the array inside the span around in memory, which makes the pointer invalid even if the span does not go out of scope.
I was aiming to re-use the pinning of the Span. As long as the Span does not go out of scope, the array is pinned by the Span (right?). As pointed out by @benaadams the Span can go out of scope, then the pinning is lost. So the SpanReader/Writer are responsible for pinning themselves (by including a Span member or a Pinnable
the array is pinned by the Span (right?)
No, Span does not pin.
As long as the Span does not go out of scope, the array is pinned by the Span (right?).
The span does not pin the array. DangerousGetPinnableReference
returns a pinnable reference, not a pinned reference.
@stephentoub @ektrah Aha! TIL: Span does not pin! So pinning happens when the fixed
keyword is used?
So pinning happens when the fixed keyword is used?
Yes.
Hence the inner span stays ;)
+1 on this, am doing work on spanification/pipelinization of Npgsql and the write/slice/write/slice is really not nice to work with.
If anybody is interested, I just wrote up a quick implementation of a "SpanReader" based on this discussion:
https://gist.github.com/DaZombieKiller/e0a3f4b3de21d998797eed33cda0f709
Just note that it's not perfect and is subject to 3 AM programming, so feel free to berate me give me constructive criticism.
@JeremyKuhne, is there anything more we want to do here after your BufferReader/Writer work, or can this issue be closed?
is there anything more we want to do here?
Well the writer isn't done yet. Additionally SequenceReader<T>
only supports ReadOnlySequence<T>
.
I think it might be worth looking at adding span support even with the additional overhead
When working with binary data, I find BinaryReader
to be very useful. It would be quite nice if BinaryReader could accept a Span, instead of a Stream.
Is this something that will potentially be looked into for 5.0? I am looking for an efficient binary data reader that preferably operates over spans. I have a legacy binary protocol that is typically length prefixed messages that encodes various primitive types somewhat similarly to how MsgPack works. Our existing code uses BinaryReader
but this does not have async support, nor is it the most efficient.
Well the writer isn't done yet. Additionally SequenceReader
only supports ReadOnlySequence . I think it might be worth looking at adding span support even with the additional overhead
So the reader API then is:
public ref struct SequenceReader<T>
{
public SequenceReader(ReadOnlySpan<T> span)
Some Sequence oriented members may not be supported.
Maybe we should create a separate issue to split the reader and the writer API?
Our existing code uses BinaryReader but this does not have async support, nor is it the most efficient.
A Span reader won't have async support either because the ref struct
can't live on the heap (which is necessary to capture async state).
ref dotnet/runtime#23601
When using span to read/write a block of memory you often end up with the following pattern
Now obviously that code is not 100% representative of real code but the overall pattern is. I have used span in a couple of real world apps and the pattern ends up being a mess. Write/Read then slice, Write/Read then slice again. It seems that as it is a very common pattern (from my use anyway) the abstraction that is often come up with is a struct wrapper that can "hide" that slicing away and instead "advance" the span on writes. Something like
The kinds of methods you would want to see on such a struct would be
If you want to "peek/poke" then you can just get out the span via
reader.Span
there is no need to provide them on the type.[EDIT] Add C# syntax highlight by @karelz [EDIT] Updated to more concrete design