Open GrabYourPitchforks opened 5 years ago
A general fix here would do the proper up-front analysis instead of generating unnecessary instructions and then removing them.
If looking back more than 1 instruction for mov
s would memory loads count too? 2 and 3 instructions are given by example seen in https://github.com/dotnet/runtime/pull/32442#issuecomment-587392995 for Kestrel's Http1Connection.TakeMessageHeaders
Though this is from the non-dependency version for analysis gist which starts:
rep stosd
mov rcx, rsi
mov qword ptr [rbp-E8H], rsp
mov gword ptr [rbp+10H], rcx
mov bword ptr [rbp+18H], rdx
mov bword ptr [rbp+28H], r9
mov esi, r8d
G_M25757_IG02:
mov rcx, bword ptr [rbp+18H] ; load
mov rdx, gword ptr [rcx]
mov rcx, bword ptr [rbp+18H] ; load same
mov rdi, gword ptr [rcx+8]
mov rcx, bword ptr [rbp+18H] ; load same
mov ebx, dword ptr [rcx+16]
and ebx, 0xD1FFAB1E
mov rcx, bword ptr [rbp+18H] ; load same
mov r14d, dword ptr [rcx+20]
and r14d, 0xD1FFAB1E
cmp rdx, rdi
je SHORT G_M25757_IG10
So could be?
G_M25757_IG02:
mov rcx, bword ptr [rbp+18H] ; load
mov rdx, gword ptr [rcx]
mov rdi, gword ptr [rcx+8]
mov ebx, dword ptr [rcx+16]
and ebx, 0xD1FFAB1E
mov r14d, dword ptr [rcx+20]
and r14d, 0xD1FFAB1E
cmp rdx, rdi
je SHORT G_M25757_IG10
Would be interesting to see what the above looks like with EH Write-Thru (#543 has the changes, but they are off by default; COMPlus_EnableEHWriteThru=1 will enable).
cc @CarolEidt
Related to https://github.com/dotnet/coreclr/pull/22454. An optimization was previously introduced in coreclr which eliminates unnecessary
mov
instructions when zero-extending registers. However, that optimization only looks back one instruction to determine if the elimination is worthwhile.Per https://github.com/dotnet/coreclr/pull/23665, we have evidence that there's benefit to be realized from looking back more than one instruction when performing this optimization. We should be more aggressive about eliminating these
mov
instructions.category:cq theme:basic-cq skill-level:intermediate cost:medium