Open DrewScoggins opened 3 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Suspect that this change, https://github.com/dotnet/runtime/pull/44419, is the culprit. cc @briansull
I suspect CSE led to spills here. Example:
static double MaxTest()
{
double x = 0;
for (int i = 0; i < 5000; i++)
x += 0.0004;
return x ;
}
Before we enabled CSE for floats:
G_M17817_IG01:
vzeroupper
;; bbWeight=1 PerfScore 1.00
G_M17817_IG02:
vxorps xmm0, xmm0
xor eax, eax
align [0 bytes]
;; bbWeight=1 PerfScore 0.83
G_M17817_IG03:
vaddsd xmm0, xmm0, qword ptr [reloc @RWD00]
inc eax
cmp eax, 0x1388
jl SHORT G_M17817_IG03
;; bbWeight=4 PerfScore 26.00
G_M17817_IG04:
ret
;; bbWeight=1 PerfScore 1.00
RWD00 dq 3F3A36E2EB1C432Dh ; 0.0004
After:
G_M17817_IG01:
vzeroupper
;; bbWeight=1 PerfScore 1.00
G_M17817_IG02:
vxorps xmm0, xmm0
xor eax, eax
+ vmovsd xmm1, qword ptr [reloc @RWD00]
align [13 bytes]
;; bbWeight=1 PerfScore 2.83
G_M17817_IG03:
- vaddsd xmm0, xmm0, qword ptr [reloc @RWD00]
+ vaddsd xmm0, xmm0, xmm1
inc eax
cmp eax, 0x1388
jl SHORT G_M17817_IG03
;; bbWeight=4 PerfScore 18.00
G_M17817_IG04:
ret
;; bbWeight=1 PerfScore 1.00
RWD00 dq 3F3A36E2EB1C432Dh ; 0.0004
More registers are involved. In this case it's OK, but when we have a lot of locals it leads to spills
Yes, with CSE there is always a trade off involved. So we will get some wins and a few losses.
Run Information
Regressions in System.MathBenchmarks.Double
Historical Data in Reporting System
Repro
Run Information
Regressions in SciMark2.kernel
Historical Data in Reporting System
Repro
Run Information
Regressions in System.MathBenchmarks.Single
Historical Data in Reporting System
Repro