Closed SupinePandora43 closed 2 years ago
Tagging subscribers to this area: @dotnet/area-system-buffers See info in area-owners.md if you want to be subscribed.
Author: | SupinePandora43 |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Buffers`, `untriaged` |
Milestone: | - |
@GrabYourPitchforks no breaking changes here
Generally, pointer is risky for buffer overflow comparing to span. Using a single pointer to represent destination is a well known security threat.
Generally, pointer is risky for buffer overflow comparing to span. Using a single pointer to represent destination is a well known security threat.
Forgot to add nuint length
arguments 😅 would be interesting to know how it would work without it.
Well, Span
should really be using nint
instead of int
in the first place. This is another pain with using int
.
Well,
Span
should really be usingnint
instead ofint
in the first place. This is another pain with usingint
.
nuint
but yeah. it's true
nuint
but yeah. it's true
I disagree. A signed integer is often better because it makes the error case which is always hiding there obvious, even if you do have to include more parameter checks.
nuint
but yeah. it's trueI disagree. A signed integer is often better because it makes the error case which is always hiding there obvious, even if you do have to include more parameter checks.
What do you mean?
nuint
but yeah. it's trueI disagree. A signed integer is often better because it makes the error case which is always hiding there obvious, even if you do have to include more parameter checks.
What do you mean?
As an example, let's look at this (pseudo-)code which reads N bytes from a stream:
byte[] ReadNBytes(nint count)
{
byte[] result = new byte[count];
byte[] buffer = new byte[1024];
var bytesToRead = count;
while (bytesToRead > 0)
{
var bytesRead = stream.Read(buffer);
buffer.Slice(0, bytesRead).CopyTo(result.Slice(count - bytesToRead));
bytesToRead -= bytesRead;
}
return result;
}
If you're using unsigned integers to store the number of bytes to read, you're hiding overflows in bytesToRead -= bytesRead
and count - bytesToRead
which may mean this loop may end up running forever. With signed integers, bytesToRead will simply go negative and will always end. This is the case even with large integers, for example N - int.MaxValue
is always a representable integer without any overflows, for any positive N.
nuint
but yeah. it's trueI disagree. A signed integer is often better because it makes the error case which is always hiding there obvious, even if you do have to include more parameter checks.
What do you mean?
As an example, let's look at this (pseudo-)code which reads N bytes from a stream:
byte[] ReadNBytes(nint count) { byte[] result = new byte[count]; byte[] buffer = new byte[1024]; var bytesToRead = count; while (bytesToRead > 0) { var bytesRead = stream.Read(buffer); buffer.Slice(0, bytesRead).CopyTo(result.Slice(count - bytesToRead)); bytesToRead -= bytesRead; } return result; }
If you're using unsigned integers to store the number of bytes to read, you're hiding overflows in
bytesToRead -= bytesRead
andcount - bytesToRead
which may mean this loop may end up running forever. With signed integers, bytesToRead will simply go negative and will always end. This is the case even with large integers, for exampleN - int.MaxValue
is always a representable integer without any overflows, for any positive N.
It just meets use of this codestyle with a cost of 1 bit being used for sign instead.
byte[] ReadNBytes(nuint count)
{
byte[] result = new byte[(int)count];
byte[] buffer = new byte[1024];
nuint bytesRead = 0;
var bytesToRead = count;
while (bytesRead < count)
{
int readThisTime = stream.Read(buffer);
bytesRead += (nuint) readThisTime
buffer.Slice(0, readThisTime).CopyTo(result.Slice((int)bytesRead));
}
return result;
}
byte[]
doesn't accept nint/nuint making it literally useless
byte[]
doesn't accept nint/nuint making it literally useless
nint
/nuint
can be used as indexer for T[]
, or what did you mean?
PS: you don't need all the full-quotes of previous comments, it's just bloat in reading.
As for the proposed API. An API dealing with a two allocations more than 2GB in size isn't going to work or behave as well as you might think.
It's definitely convenient to have a single call, but the actual perf here is going to likely be much worse than you're expecting and it wouldn't be how you'd want to handle this in practice.
It would be helpful if you could provide more context as to what kind of inputs you're dealing with that are this big and why you need to convert them in one go, rather than chunking them or parallelizing the processing for better perf.
Let's also not turn this into a discussion around whether signed
vs unsigned
is better.
There are pro's and con's to each. Signed
tends to be more intuitive and less error prone when handling overflow and other failure cases. Unsigned
tends to have better optimization support.
The JIT and internal implementation can reasonably optimize never-negative signed
when that information is present and so signed
ends up being a "better default" from a UX perspective.
The main downside is also not really a downside. You lose a bit that you will, realistically speaking, never need. On a 32-bit machine, you have 4GB of memory and you will practically never be able to do a single allocation that is near or greater than 2GB in size. It's impractical because that's more than half the address space available to the entire machine. You'll in the best case cause some pretty major contention and slow down the entire machine.
On a 64-bit machine, no modern computer supports that much memory anyways and when you do get to more than 2^63
bits of memory, you hit the same contention and other issues that you'd have hit on 32-bit.
Agree with Tanner - it would be strange to expose pointer-based overloads here without more context as to why they're necessary for extremely large data. I'd much rather see us focus on making Span<T>
(or equivalent) able to support larger buffers, which will also involve the creation of an OperationStatus-based pattern for dealing with these inputs. Once this pattern is created, we can consider which APIs (such as this) are candidates for enlightening.
As was mentioned earlier, it's possible to implement this today at the call site via a loop.
Unless there's a strong motivation to keep this around I'm going to let the bot auto-close it.
This issue has been marked needs-author-action
and may be missing some important information.
This issue has been automatically marked no-recent-activity
because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity
.
@GrabYourPitchforks
public ref struct Span<T>
{
private ref reference;
private nuint length;
[Obsolete (BinaryCompatibilityOnly)]
public int Length => checked((int)length);
public nuint Length => length;
}
Then?
This isn't the right issue to discuss changing Span. I'd suggest taking that discussion to https://github.com/dotnet/runtime/issues/12221.
Background and motivation
As said in #67649,
Encoding
can't be edited for sake of compatibility, but Utf8 is a static class! So it can.API Proposal
API Usage
(string conversion)
Alternative Designs
67649 (edit
Encoding
)Risks
No such