Open jakobbotsch opened 3 months ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
There is a question of whether we can optimize Span<T>
as well as T[]
without introducing (more) special status for Span<T>
/ReadOnlySpan<T>
. That's because the transformation shown above is actually illegal for the JIT to do unless we make it undefined behavior for a Span<T>
to exist with an "invalid" range of managed byrefs.
Consider the following example:
static void Main()
{
int[] values = [1, 2, 3, 4, 0];
Span<int> exampleSpan = MemoryMarshal.CreateSpan(ref values[0], int.MaxValue);
Sum(exampleSpan); // No problem today
Sum2(exampleSpan); // Forms illegal byref
}
private static int Sum(Span<int> s)
{
int sum = 0;
foreach (int x in s)
{
if (x == 0)
break;
sum += x;
}
return sum;
}
private static int Sum2(Span<int> s)
{
int sum = 0;
ref int p = ref MemoryMarshal.GetReference(s);
ref int end = ref Unsafe.Add(ref p, s.Length);
while (Unsafe.IsAddressLessThan(ref p, ref end))
{
int x = p;
if (x == 0)
break;
sum += x;
p = ref Unsafe.Add(ref p, 1);
}
return sum;
}
exampleSpan
is created with a valid byref but a length that makes _reference + length
an invalid byref. Today, there is no problem in Sum
because we do not eagerly form the _reference + length
byref, but Sum2
ends up eagerly forming this illegal byref.
The strength reduction optimization would have the JIT transform Sum
to Sum2
.
@jkotas @davidwrighton any thoughts on this? Can we document somewhere that Span<T>
/ReadOnlySpan<T>
have "special status" to make them amenable to optimizations to a similar level to T[]
? I think we would document two things:
Span<T>
, i.e. _reference + length
must point inside (or at the end of) the same object as _reference
when it is a managed byref.The existing Span uses do not always follow this restriction. For example:
I guess we can document it retroactively as a breaking change and try to fix all instances of the bad patterns that we can find.
Hmm, I'll have to see if that seems to be worth it once I get further. I can start out with arrays for now to do the measurements.
I think instead of forming end = span._reference + span.length * size
, we can just utilize a reverse counted loop and come out equal on x64/arm64. For example, Sum2
will usually end up as
private static int Sum2(Span<int> s)
{
int sum = 0;
ref int p = ref MemoryMarshal.GetReference(s);
if (s.Length > 0)
{
int length = s.Length;
do
{
int x = p;
if (x == 0)
break;
sum += x;
p = ref Unsafe.Add(ref p, 1);
} while (--length > 0);
}
return sum;
}
when loop inversion is kicking in. The --length > 0
can be done in 2 instructions + 1 live variable on arm64/x64, exactly the same as if we had formed end
.
We sadly still have the problem described above for Span<T>
. Without the assumption that a Span<T>
points within the same managed object it is illegal to transform
public static int Sum(Span<int> span, Func<int, bool> sumIndex)
{
for (int i = 0; i < span.Length; i++)
sum += sumIndex(i) ? span[i] : 0;
return sum;
}
into
public static int Sum(Span<int> span, Func<int, bool> sumIndex)
{
ref int val = ref span[0];
for (int i = 0; i < span.Length; i++)
{
sum += sumIndex(i) ? val : 0;
val = ref Unsafe.Add(ref val, 1);
}
return sum;
}
The same transformation seems ok for arrays.
(Of course whether or not this transformation is profitable is another question entirely.)
Now that we have an SSA based IV analysis (added in #97865) we should implement strength reduction based on it. Example loop:
Codegen x64:
Codegen arm64:
The point of strength reduction is to optimize the loop codegen as if it had been written as follows:
The codegen would look like: x64:
arm64:
For arm64 there is the additional possibility of using post-increment addressing mode by optimizing the placement of the IV increment once the strength reduction has happened. The loop body is then reducible to: