Closed jakobbotsch closed 1 month ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
What's happening here is that for Sve.Max(vr3, vr4)
both vr3
and vr4
are the same.
/--* t26 simd16
N005 ( 4, 3) [000052] DA-XG------ * STORE_LCL_VAR simd16 V07 cse0 d:1 $VN.Void
N006 ( 1, 1) [000053] ----------- t53 = LCL_VAR simd16 V07 cse0 u:1 <l:$2c1, c:$2c2>
N008 ( 1, 1) [000055] ----------- t55 = LCL_VAR simd16 V07 cse0 u:1 (last use) <l:$2c0, c:$300>
[000073] ----------- t73 = HWINTRINSIC mask short CreateTrueMaskAll
[000074] -c--------- t74 = CNS_VEC simd16<0x00000000, 0x00000000, 0x00000000, 0x00000000>
/--* t53 simd16
+--* t55 simd16
N009 ( 7, 6) [000034] -A-XG+----- t34 = * HWINTRINSIC simd16 short Max <l:$2c4, c:$2c5>
/--* t73 mask
+--* t34 simd16
+--* t74 simd16
[000075] -A-XG------ t75 = * HWINTRINSIC simd16 short ConditionalSelect
N010 ( 1, 1) [000023] -----+----- t23 = LCL_VAR ref V03 tmp2 u:2 (last use) <l:$181, c:$141>
At codegen we combine the cndsel and max to try to create:
IN000a: 000034 ldr q16, [x0, #0x08]
IN000b: 000038 ptrue p0.h
IN000c: 00003C movprfx z17.h, p0/z, z17.h
IN000d: 000040 smax z17.h, p0/m, z17.h, z17.h
This is generated by simply removing the assert.
However, there is still an issue .... the instruction after a movprfx cannot use the same destination register as an input. (see here - The prefixed instruction must not use the destination register in any other operand position)
What we need to do is delay free all the inputs to SMAX
, such that it now creates:
IN000a: 000034 ldr q16, [x0, #0x08]
IN000b: 000038 ptrue p0.h
IN000c: 00003C movprfx z17.h, p0/z, z16.h
IN000d: 000040 smax z17.h, p0/m, z17.h, z16.h
Which is good.
Problem I'm having right now is how, during lsra, do I detect that these two nodes refer to the same value?
N006 ( 1, 1) [000053] ----------- t53 = LCL_VAR simd16 V07 cse0 u:1 <l:$2c1, c:$2c2>
N008 ( 1, 1) [000055] ----------- t55 = LCL_VAR simd16 V07 cse0 u:1 (last use) <l:$2c0, c:$300>
Can I check the cse somehow?? Looking at the node, gtCSEnum
is 0.
Problem I'm having right now is how, during lsra, do I detect that these two nodes refer to the same value?
N006 ( 1, 1) [000053] ----------- t53 = LCL_VAR simd16 V07 cse0 u:1 <l:$2c1, c:$2c2> N008 ( 1, 1) [000055] ----------- t55 = LCL_VAR simd16 V07 cse0 u:1 (last use) <l:$2c0, c:$300>
Can I check the cse somehow?? Looking at the node,
gtCSEnum
is 0.
Why do we need to detect that? Can't we just delay free the inputs unconditionally?
Why do we need to detect that? Can't we just delay free the inputs unconditionally?
That would work. It's not going to hit performance. It will increase the register usage a little, but we can fix that at a later stage if we find we're spilling too much.
Hmm, not sure that I understand -- why does it increase register pressure? Since those registers cannot be used as the destination, is there any choice but to unconditionally mark them as delay freed?
I probably don't fully understand how the conditional select comes into play here.
Hmm, not sure that I understand -- why does it increase register pressure? Since those registers cannot be used as the destination, is there any choice but to unconditionally mark them as delay freed?
I was thinking it would delayfreeing when the inputs are different would do that. But, no, they will have different registers anyway.
cc @dotnet/arm64-contrib @dotnet/jit-contrib