dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.08k stars 4.7k forks source link

JIT should suppress zero-extending same-register moves in more scenarios #12402

Open GrabYourPitchforks opened 5 years ago

GrabYourPitchforks commented 5 years ago

Related to https://github.com/dotnet/coreclr/pull/22454. An optimization was previously introduced in coreclr which eliminates unnecessary mov instructions when zero-extending registers. However, that optimization only looks back one instruction to determine if the elimination is worthwhile.

Per https://github.com/dotnet/coreclr/pull/23665, we have evidence that there's benefit to be realized from looking back more than one instruction when performing this optimization. We should be more aggressive about eliminating these mov instructions.

category:cq theme:basic-cq skill-level:intermediate cost:medium

AndyAyersMS commented 5 years ago

A general fix here would do the proper up-front analysis instead of generating unnecessary instructions and then removing them.

benaadams commented 4 years ago

If looking back more than 1 instruction for movs would memory loads count too? 2 and 3 instructions are given by example seen in https://github.com/dotnet/runtime/pull/32442#issuecomment-587392995 for Kestrel's Http1Connection.TakeMessageHeaders

Though this is from the non-dependency version for analysis gist which starts:

       rep stosd 
       mov      rcx, rsi
       mov      qword ptr [rbp-E8H], rsp
       mov      gword ptr [rbp+10H], rcx
       mov      bword ptr [rbp+18H], rdx
       mov      bword ptr [rbp+28H], r9
       mov      esi, r8d

G_M25757_IG02:
       mov      rcx, bword ptr [rbp+18H]  ; load
       mov      rdx, gword ptr [rcx]
       mov      rcx, bword ptr [rbp+18H]  ; load same
       mov      rdi, gword ptr [rcx+8]
       mov      rcx, bword ptr [rbp+18H]  ; load same
       mov      ebx, dword ptr [rcx+16]
       and      ebx, 0xD1FFAB1E
       mov      rcx, bword ptr [rbp+18H]  ; load same
       mov      r14d, dword ptr [rcx+20]
       and      r14d, 0xD1FFAB1E
       cmp      rdx, rdi
       je       SHORT G_M25757_IG10

So could be?

G_M25757_IG02:
       mov      rcx, bword ptr [rbp+18H]  ; load
       mov      rdx, gword ptr [rcx]
       mov      rdi, gword ptr [rcx+8]
       mov      ebx, dword ptr [rcx+16]
       and      ebx, 0xD1FFAB1E
       mov      r14d, dword ptr [rcx+20]
       and      r14d, 0xD1FFAB1E
       cmp      rdx, rdi
       je       SHORT G_M25757_IG10
AndyAyersMS commented 4 years ago

Would be interesting to see what the above looks like with EH Write-Thru (#543 has the changes, but they are off by default; COMPlus_EnableEHWriteThru=1 will enable).

cc @CarolEidt