Closed jakobbotsch closed 2 months ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Preview 7 is ok
G_M000_IG01: ;; offset=0x0000
sub esp, 32
G_M000_IG02: ;; offset=0x0003
vxorps ymm0, ymm0, ymm0
vrangeps ymm0, ymm0, ymmword ptr [@RWD00], 0
vmovups ymmword ptr [esp], ymm0
mov ecx, 0x8B3C8AC
call CORINFO_HELP_NEWSFAST
vmovups ymm0, ymmword ptr [esp]
vmovups ymmword ptr [eax+0x04], ymm0
mov ecx, eax
call [System.Console:WriteLine(System.Object)]
G_M000_IG03: ;; offset=0x0033
vzeroupper
add esp, 32
ret
RWD00 dq 00000000FFA00000h, 0000000000000000h, 0000000000000000h, 0000000000000000h
but recent builds are not
G_M27646_IG01: ;; offset=0x0000
sub esp, 32
;; size=3 bbWeight=1 PerfScore 0.25
G_M27646_IG02: ;; offset=0x0003
vxorps ymm0, ymm0, ymm0
vrangeps ymm0, ymm0, ymmword ptr [@RWD00], 0
vmovups ymmword ptr [esp], ymm0
mov ecx, 0xB6632E0 ; System.Runtime.Intrinsics.Vector256`1[uint]
call CORINFO_HELP_NEWSFAST
vmovups ymm0, ymmword ptr [esp]
vmovups ymmword ptr [eax+0x04], ymm0
mov ecx, eax
call [System.Console:WriteLine(System.Object)]
;; size=48 bbWeight=1 PerfScore 16.83
G_M27646_IG03: ;; offset=0x0033
vzeroupper
add esp, 32
ret
;; size=7 bbWeight=1 PerfScore 2.25
RWD00 dq 00000000FFE00000h, 0000000000000000h, 0000000000000000h, 0000000000000000h
Only diff is the value in RWD00
.
***** BB01 [0000]
STMT00001 ( ??? ... 0x025 )
[000009] DA--------- * STORE_LCL_VAR simd32<System.Runtime.Intrinsics.Vector256`1> V01 loc1
[000008] ----------- \--* HWINTRINSIC simd32 float Or
[000003] ----------- +--* CNS_VEC simd32<0x000000003f800000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
[000005] ----------- \--* CNS_VEC simd32<0x00000000c1200000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
becomes
[000009] DA--------- * STORE_LCL_VAR simd32<System.Runtime.Intrinsics.Vector256`1> V01 loc1
[000005] -----+----- \--* CNS_VEC simd32<0x00000000ffe00000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
so somehow an extra bit gets set in the vector constant (FFE instead of FFA).
Looks like this is nan normalization? 0xFFA00000 is a nan float, and we normalize it to 0xFFE00000.
Not quite NaN normalization, but rather 0xFFA0_0000
is an sNaN
(signalling NaN) because the most significant bit of the significand
is zero and sNaN
when encountered by the CPU raise the "signal" (throw an exception if IEEE 754 floating-point exception handling is enabled, which it never is for .NET), and then suppress the signal from being raised again in the future by setting the bit that converts it to a qNaN
(quiet NaN).
[0xFFC0_0000, 0xFFFF_FFFF]
is -qNaN
[0xFF80_0001, 0xFFBF_FFFF]
is -sNaN
0xFF80_0000
is -inf
0x7F80_0000
is +inf
[0x7F80_0001, 0x7FBF_FFFF]
is +sNaN
[0x7FC0_0000, 0x7FFF_FFFF]
is +qNaN
In general we (RyuJIT) try and avoid any explicit normalization and instead try to explicitly propagate wherever applicable (that is, if a NaN
is passed in, we return precisely that NaN
back). The only places we explicitly produce canonical NaN
is when neither input is a NaN (so -inf + inf
produces NaN, this requires a net new NaN to be produced so we choose the canonical NaN of 0xFFC0_0000
on xarch and 0x7FC0_0000
on arm/arm64/loongarch64/riscv64).
So this is somewhat expected behavior, but it would still be interesting to find out what is causing the CPU to see it as an sNaN
for this particular case. I expect there's some place its being seen as a scalar float
and we could instead pass it through as a scalar uint32_t
to ensure that operations like and
, not
, or
, xor
, left shift
, and right shift
don't exhibit such behavior.
The path we take is:
gtFoldExprHWIntrinsic
[000008] ----------- >>> * HWINTRINSIC simd32 float Or
[000005] -----+----- +--* CNS_VEC simd32<0x00000000c1200000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
[000003] -----+----- \--* CNS_VEC simd32<0x000000003f800000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
GenTreeVecCon::EvaluateBinaryInPlace
EvaluateBinarySimd<simd32_t>
EvaluateBinarySimd<simd32_t,float>
EvaluateBinaryScalar<float>(genTreeOps oper, float arg0, float arg1)
EvaluateBinaryScalarSpecialized<float>(genTreeOps oper, float arg0, float arg1)
uint32_t resultBits is 0xFFA0_0000
UInt32BitsToSingle produces float 0xFFE0_0000
👍, so we'd need to intercept this prir to extracting the scalar value out of the SIMD input (that is prior to EvaluateBinaryScalar
I think the simplest place to do that is EvaluateBinarySimd<simd32_t>
, basically check for oper
in case TYP_FLOAT/TYP_DOUBLE
and dispatch instead to EvaluateBinarySimd<TSimd, int32_t/int64_t>
: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.h#L1189-L1199
I can get a fix up for this.
Ok, I will reassign this one to you.
cc @dotnet/jit-contrib