Open kunalspathak opened 3 years ago
@dotnet/jit-contrib
In below assembly code, there are 54 places where we do the following:
; ...
mov eax, gword ptr [ebp-40CH]
jmp G_M36558_IG174
; ...
In all, there are 55 places where the jump to G_M36558_IG174
happens and the only place where we don't see this pattern happens in prolog:
; ...
mov gword ptr [ebp-40CH], ecx
mov eax, ecx
;; bbWeight=1 PerfScore 12.58
G_M36558_IG02:
jmp G_M36558_IG174
; ...
To summarize, we can do better by adding the resolution code inside G_M36558_IG174
and it will reduce the code size for this particular example from 324 bytes to 6 bytes.
; ...
G_M36558_IG174:
mov eax, gword ptr [ebp-40CH]
; ...
The changes were making EH-writethru enable for variables having single def. Below, V00
(the variable for which we are seeing resolution mov
s), is enregistered, followed by diff screenshot that shows the resolution added.
Here is another case where resolution introduces unneeded movs:
public void Case5(int x, int y)
{
var a = array;
for (int i = 0;i < 1000; i++)
{
try
{
a[i] = x + y;
}
catch { }
}
}
Above assembly code is based on https://github.com/dotnet/runtime/pull/47307 where we will start enregistering EH vars that has single def. Here, we add resolution to restore rsi
, r8d
and rdx
at the end of G_M51048_IG05
which is part of the loop. Perhaps, we should see if the recent refposition has ever changed and if not, just do not add such resolutions.
mov edx, dword ptr [rbp+18H]
mov rsi, gword ptr [rbp-18H]
mov r8d, dword ptr [rbp+20H]
Today, resolution doesn't take into account block weights where it adds resolution. Another improvement would be take that factor into account. Also, possibly a post-resolution walk-thru to eliminate / squeeze moves added (some kind of peephole optimization but for resolution moves) will be beneficial.
I have fixed some of the redundant resolution movs as part of https://github.com/dotnet/runtime/pull/54345. More work will be done in Future release.
Sometimes, the resolution blocks are added that breaks the contiguous flow of loop which can be bad for performance. https://github.com/dotnet/runtime/issues/58443#issuecomment-912114002
Problem statement
This issue captures various problems with existing resolution phase of register allocator.
While doing some other investigation, I noticed a scenario where we create new BB during resolution phase, but the compensation code inside them is identical.
Investigate if we could come with single basic block in such case that has the required code and all other jump to that block. As an effect, this increases the PerfScore from 3.00 to 24.00 triggering regression while doing asmdiffs.
category:design theme:register-allocator skill-level:expert cost:large impact:medium