Open JeremyKuhne opened 3 years ago
Tagging subscribers to this area: @tannergooding, @pgovind, @GrabYourPitchForks See info in area-owners.md if you want to be subscribed.
Author: | JeremyKuhne |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Buffers`, `untriaged` |
Milestone: | - |
public int Length { get; }
It could be nint
. It supports C# arithmetic operators.
It could be nint.
A span's length is a 32-bit integer. Consequently, a SpanReader
's length will never exceed int.MaxValue
.
Doesn't seem to be much value here?
@scalablecory Can you elaborate? I found having something to track "position" in parsing a span extremely useful and manually doing so cumbersome and error-prone.
Note that this is something that came up in development of the original SequenceReader
(which I wrote) that I intended to follow up on. Shifting teams caused this to back-burner for awhile. I'm using my own implementation of this in things I'm working on which motivated me to finally get this proposal created.
I found having something to track "position" in parsing a span extremely useful and manually doing so cumbersome and error-prone ... I'm using my own implementation of this in things I'm working on which motivated me to finally get this proposal created.
Can you give some usage examples showing how this reduces code complexity?
SequenceReader
is useful because it crosses segment boundaries, but this looks like it'll just be a trivial wrapper around existing APIs.
Can you give some usage examples showing how this reduces code complexity?
The key thing is that to do the same thing I'm showing in the proposal is that you would have to keep slicing yourself (or trying to not mess up your current index when you're slicing). But beyond that the API patterns are very convenient for parsing and aren't just single method calls. Additionally you can't back up a step easily when slicing over and over.
int index = data.IndexOf(Identifiers.StartOfText);
if (index == -1)
{
throw new InvalidDataException("No start text (STX) control character found.");
}
data = data.Slice(index);
ushort checksum = CalculateFileChecksum(data);
data = data.Slice(1);
index = data.IndexOf(Identifiers.FieldTerminator);
if (index == -1)
{
throw new InvalidDataException("Did not find design specification.");
}
info.DesignSpecification = Encoding.ASCII.GetString(data.Slice(0, index));
data = data.Slice(index + 1);
// etc... ugh
Updated the sample because I messed it up the first time.
The main advantage of an API like this is that it makes the "read and move" code a lot simpler. For example:
if (!reader.TryReadByte(out var protocolVersion))
throw new InvalidOperationException("Could not read protocol version");
int size = 0;
if (protocolVersion == 1)
{
if (!reader.TryReadInt16LittleEndian(out var sizeShort))
throw new InvalidOperationException("Failed to read packet size");
size = sizeShort;
}
else if (protocolVersion == 2)
{
if (!reader.TryReadInt32LittleEndian(out size))
throw new InvalidOperationException("Failed to read packet size");
}
if (!reader.TryGetSpan(size, out var span))
throw new InvalidOperationException("Failed to get packet data");
instead of:
if (span.Length < 1)
throw new InvalidOperationException("Could not read protocol version");
var protocolVersion = span[0];
int size = 0;
if (protocolVersion == 1)
{
if (!BinaryPrimitives.TryReadInt16LittleEndian(span, out var size))
throw new InvalidOperationException("Failed to read packet size");
span = span.Slice(sizeof(short));
}
else if (protocolVersion == 2)
{
if (!BinaryPrimitives.TryReadInt32LittleEndian(span, out var size))
throw new InvalidOperationException("Failed to read packet size");
span = span.Slice(sizeof(short)); // bug!
}
if (span.Length < size)
throw new InvalidOperationException("Failed to get packet data");
In the second snippet, it's very easy to accidentally introduce bugs as you have to explicitly advance the span. I've done this many times in my own code, and have ended up writing at least one SpanReader type to minimise these bugs.
I've myself had to use such a tool multiple times, but it was extremely easy to add the read methods I needed as extension methods.
using System;
ReadOnlySpan<byte> buffer = new byte[1000];
Console.WriteLine(buffer.ReadInt32());
static class ReadOnlySpanExtensions
{
public static int ReadInt32(ref this ReadOnlySpan<byte> buffer)
{
int number = BitConverter.ToInt32(buffer);
buffer = buffer[4..];
return number;
}
}
A span's length is a 32-bit integer. Consequently, a SpanReader's length will never exceed int.MaxValue.
Should Span start supporting more than int.MaxValue elements, nothing would need to change in the API.
Span itself would never support more elements because it would have been a breaking change for all the developers that use 32-bit integers to store the index while looping one.
Supporting more elements would need a LargeSpan<T>
type with a nint
or long
Length
, and consequentially a .LargeSpanReader<T>
type to match it
SpanReader
could have its length in a nint
, and once and if LargeSpan
got added, it would transparently support both. But that depends on whether the .NET team is willing to add a LargeSpan
type in the future.
What the shape/solution could be for arrays/spans larger than Int32.MaxValue
elements is already being discussed at length (pun semi-intended) at #12221 and currently that still seems to be a mostly speculative thread.
On topic: Seems like the shape is good enough for most cases. It probably won't replace the one usecase where I need custom look-ahead logic, but other cases can clean up nicely.
It probably won't replace the one usecase where I need custom look-ahead logic
@joe4evr Can you elaborate on this? The current surface area was very much scenario driven- we came up with the original surface area (with SequenceReader
) by trying to develop the System.Text.Json
reader on it. Would be nice to continue to evolve these structs based on real scenarios where possible. :)
As to the large span scenario I'd be hesitant to respond to design that isn't finished. Wouldn't want to use nint
if things eventually settled on long
, for example. I'm also presuming getting said feature is a non-trivial amount of time in the future. Having a LargeSpanReader
isn't a terrible thing if it means we can get something we can use in the near term.
Here is the implementation: https://github.com/JeremyKuhne/runtime/commit/86ff403820b887d0216678c70b5f8a96c0fc458e
@Joe4evr Can you elaborate on this?
Yeah, so my case is in parsing a chunk of text when encountering an open-brace char, I have a function that looks ahead to find the matching close-brace, keeping track of any nested open/close-brace pairs that may exist within, and returns the slice inside of the braces.
From a glance at the proposed API, I'm not sure if that's doable straight away, or even common enough to have it baked into the implementation (since I presume this is intended to be high-perf/"low-fat").
@Joe4evr thanks for the scenario. I'll give it some thought as this is a not uncommon parsing scenario. Things get a bit awkward if you also support escaping so it isn't immediately clear what the surface area would look like.
I do have escaping included, yes:
for (int i = startIdx; i < span.Length; i++)
{
char current = span[i];
if (current == '\\')
{
i += 1;
continue;
}
// ....
}
Is there already a spanreader available? Would be very useful instead of manually keeping track of the current slice position. Passing a spanreader would be a lot easier 👍
Bump Is there already a SpanReader available?
Bump Is there already a SpanReader available?
https://sakno.github.io/dotNext/versions/2.x/api/DotNext.Buffers.SpanReader-1.html :-)
@JeremyKuhne was there a particular reason for SequenceReaderExtensions
to only contain methods for signed integer types, and could we get unsigned variants here (and ideally also there)?
Background and Motivation
SequenceReader<T>
was fine tuned to give the best performance forReadOnlySequence<T>
and explicitly didn't support a second state constructed around aReadOnlySpan<T>
directly.There is no practical way to turn a
ReadOnlySpan<T>
into aReadOnlySequence<T>
. Providing a reader that follows a similar pattern when readingReadOnlySpan<T>
would help people reading spans directly and would allow further sub-parsing of spans returned fromSequenceReader<T>
methods.Proposed API
The proposed API follows along with
SequenceReader<T>
with the following differences:ReadOnlySequence<T>
methods.int
rather thanlong
as spans are constrained toint
.Usage Examples
Alternative Designs
Adding direct
ReadOnlySpan<T>
support toSequenceReader<T>
is technically possible but would have a negative performance impact that is measurable in key web scenarios. It also would be hard for consumers to understand the implications of calling overloads that haveReadOnlySequence<T>
outputs (do we copy to an array?).