Open JeremyKuhne opened 2 months ago
Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.
The lack of this API was a PITA for me when I was implementing the Payload Reader for BF. FWIW this is what I ended up doing: https://github.com/adamsitnik/SafePayloadReader/blob/787b24d4e535ddee181b8e7da2546fe258735ef7/System.Runtime.Serialization.BinaryFormat/Records/RectangularOrCustomOffsetArrayRecord.cs#L50-L123
This is notably a "dangerous" operation. While you can't exactly AV (provided you still check the index is within the flattened length of the array), you are still bypassing the bounds checks indicated by the array's shape and this can easily cause corruption or other issues if the developer isn't careful. -- Accessing linear index 16
could be, row 4, column 4
; row 2, column 8
; row 8, column 2
, etc; and the actual visual location changed is different for each of these. A common bug in image manipulation software is caused by this kind of issue, where you accidentally increment the column past the end of a row and start drawing into the next row instead, when it actually should have terminated.
I think it would be beneficial to have, but I do very much think it should be very explicit in its name that the operation is Unsafe
and there should likely be some version that is generic in order to avoid any boxing cost. -- As an alternative, perhaps these should be APIs in MemoryMarshal
or extension methods typed over the most common dimension sizes (T[,]
, T[,,]
, etc).
This is notably a "dangerous" operation.
@tannergooding The dimension layout is deterministic. I presume you mean dangerous in that you don't translate the indices to linear index correctly? This is precisely the issue I want to avoid for de/serialization by never messing with the indices array.
As an alternative, perhaps these should be APIs in MemoryMarshal or extension methods typed over the most common dimension sizes (T[,], T[,,], etc).
Note I need to support any rank (2 - 32) and any (not previously known) Type
.
The issue being that a uint[,] array = uint[1080, 1920]
has a Length
of 2_073_600
. Thus any linear index [0, 2_073_599]
is valid, but not all linear indices computed as (y * 1920) + x
are valid. For example, array[1, 1920]
is itself invalid and would throw, but used in the standard computation it accesses linear index 3840
, which is actually array[2, 0]
.
So in general, you're losing safety, making this is an "unsafe" operation. So users should accordingly have it made very clear that this is the case.
This type of miscomputation is a super common bug, especially in more complex loops, coordinate calculation logic, etc. A simple example is when you're taking something like a user command DrawLine(start, end)
which operates in "world space" and then you're translating that to "screen space". The actual line can extend outside the screen space, and if you've not doing clipping correctly, you'll end up drawing into the left side of the buffer instead. This is then missed because its still "in bounds" of the overall image space and it may only appear on the screen for a fraction of a second in some cases.
The same general issue extends outside image processing/rasterization as well, and applies to any type of multi-dimensional access. In general the best thing would be for the JIT to recognize the general pattern of for (int y = 0; ...) { for (int x = 0; ...) { array[y, x] = ...; } }
and elide bounds checks itself. An API as proposed here is the next best thing and an unsafe workaround since it bypasses those bounds checks.
So in general, you're losing safety, making this is an "unsafe" operation. So users should accordingly have it made very clear that this is the case.
I'm not opposed to using "Unsafe" in the API name. I suspect that it would be unlikely that people would use this one over the one that takes indices if they have them. That said, we should probably add an overload that takes ReadOnlySpan<int>
for the mentioned SetValue
.
Here is how I'm currently trying to decompose a flattened index:
int ranks = array.Rank;
Span<int> rankProducts = stackalloc int[ranks];
rankProducts[ranks - 1] = 1;
for (int i = ranks - 2; i >= 0; i--)
{
rankProducts[i] = array.GetLength(i + 1) * rankProducts[i + 1];
}
Span<int> indices = stackalloc int[ranks];
for (int i = 0; i < ranks; i++)
{
indices[i] = flattenedIndex / rankProducts[i];
flattenedIndex -= indices[i] * rankProducts[i];
}
array.SetValue(value, indices.ToArray());
Array.Copy
, Array.Clear
, Buffer.BlockCopy
and number of other APIs accept flattened index for multidimensional arrays. None of these APIs have unsafe in the name.
BTW: The flattened index has odd behavior for arrays with lower bounds in Array.Copy
and Array.Clear
. The implementation subtracts the first lower bound from it, but no other lower bounds.
Are jagged arrays a relevant case? Or Array class does not represent those in the same way (not at desk to try)
Flattened index does not make sense for jagged arrays.
Background and motivation
When serializing/deserializing array data it is costly and error prone to iterate through multidimensional arrays. Ultimately all
Array
apis flatten the index when getting/setting. Iterating through the data without having to compose the indices array would be much easier.(Cost and complexity is around creating and maintaining indices.
BinaryFormatter
code to do this is fairly complicated and they just get flattened byArray.SetValue
with additional cost.)API Proposal
API Usage
Alternative Designs
One can grab spans of the raw data if one knows the
T
by pinning or doing something like this (I think?):I presume that you can use
object?
as theT
if you have reference types?If you have a value type that has object references in it and all you have is
Type
, I'm not sure there is a way to safely set a given raw offset other than callingArray.SetValue
. (Or possiblyArrayCopy
with a smaller array with the same number of dimensions, but I haven't validated.)Risks
No response