Open a74nh opened 1 month ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
[MethodImpl(MethodImplOptions.NoInlining)]
public static unsafe int foo(ref int* src, int length)
{
Vector<int> pred = (Vector<int>)Sve.CreateWhileLessThanMask32Bit(0, length);
Vector<int> vec = Sve.LoadVector(pred, src);
return (int)Sve.AddAcross(vec).ToScalar();
}
Reducing down and the conversion vanishes.
This is because Vector<int> pred
produces a TYP_SIMD
local and so that's what gets CSE'd.
We still need to add a step to the JIT that identifies cases of STORELCL(TYP_SIMD, ConvertMaskToVector(x))
and transforms it instead to STORELCL(TYP_MASK, x)
and replaces usages with ConvertMaskToVector(lcl)
; or something to that effect.
If you were to change the code to instead be:
[MethodImpl(MethodImplOptions.NoInlining)]
public static unsafe int foo(ref int* src, int length)
{
Vector<int> total = new Vector<int>(0);
Vector<int> vec = Sve.LoadVector(Sve.CreateWhileLessThanMask32Bit(0, length).AsInt32(), src);
total = Sve.ConditionalSelect(Sve.CreateWhileLessThanMask32Bit(0, length).AsInt32(), Sve.Add(total, vec), total);
return (int)Sve.AddAcross(total).ToScalar();
}
Then I expect it would also be fixed, as the CreateWhileLessThanMask32Bit
would be CSE'd instead and produce a TYP_MASK
local.
We should still add the right step to the JIT to ensure the right CSE happens, however.
Also:
i += Sve.GetActiveElementCount(pz, pz);
becomes:
ptrue p2.b
cmpne p1.b, p2/z, z16.b, #0
ptrue p0.b
cmpne p0.b, p0/z, z16.b, #0
cntp x0, p1, p0.b
add x0, x0, x1
Both args to the cntp
should be using the same predicate instead of doing two conversions.
That's likely because SVE isn't using mask constants currently and so LSRA doesn't do the checks for "is constant already in a register"
This is because
Vector<int> pred
produces aTYP_SIMD
local and so that's what gets CSE'd.We still need to add a step to the JIT that identifies cases of
STORELCL(TYP_SIMD, ConvertMaskToVector(x))
and transforms it instead toSTORELCL(TYP_MASK, x)
and replaces usages withConvertMaskToVector(lcl)
; or something to that effect.
Switching the STORELCL
is fairly easy.
Finding the uses to remove the ConvertMaskToVector
is a little more tricky, depending on when this is done.
LocalsTreeList()
to find all local uses, which means doing this between morph and build SSA.BlockRange().TryGetUse()
I believe the general idea is that we want to do this as its own pass after local morph
, as per https://github.com/dotnet/runtime/pull/99608#discussion_r1523730454
See Statement::LocalsTreeList. It allows to quickly check whether a statement contains a local you are interested in.
@jakobbotsch, picking up on your suggestion in 99608.
I'm using Statement::LocalsTreeList
. This requires fgNodeThreading
to be AllLocals
.
when I find the ConvertVectorToMask/ConvertMaskToVector
I remove them by bashing the node to GT_NOP
and rewiring the parent to the child.
Then I need to fix up next/prev pointers. In previous PRs I did this using fgSetStmtSeq()
. However, that requires fgNodeThreading
to be AllTrees
.
Any suggestions for how I should be fixing that up?
fgSequenceLocals
does the same as fgSetStmtSeq
for locals linking.
With
DOTNET_TieredCompilation=0
These three lines are not required. They are converting mask -> vector -> mask
I suspect this is because there are two uses of
pred
- in conditional select and load vector.