Open Quuxplusone opened 6 years ago
Attached asidiff_trim
(811102 bytes, text/plain): side-to-side diff of debug dumps (unpatched <> patched)
It seems the interesting difference comes in RS_Split cascade 28, where the
original code makes this decision:
Split for compact region in 2 bundles, intv 1.
splitAroundRegion with 2 globals.
while the patched code makes this decision:
Best split candidate of vreg %129 may not cause bad eviction chain
Split for $r3l in 1 bundles, intv 1.
Split for compact region in 1 bundles, intv 2.
splitAroundRegion with 3 globals.
Note the debug output says "may not cause bad eviction chain", but the
resulting set of COPYs does in fact look suspiciously like a bad eviction chain
as described in the comments in RegAllocGreedy.cpp ...
@Marina: this test case does indeed seem related to splitCanCauseEvictionChain,
and since you are the expert here, I would like to ask if you could share some
advice?
I have not been able to figure out yet exactly which evictions are bad here,
but one observation on this function is that it isn't really sensible to try to
evict any register for the sake of another, since they all overlap and are
practically identically structured.
The spill weights themselves are extremely small and very similar. I tried
to add a heuristic like "don't evict if intervals are very similar" like
diff --git a/lib/CodeGen/RegAllocGreedy.cpp b/lib/CodeGen/RegAllocGreedy.cpp
index 182d966..daf8ed7 100644
--- a/lib/CodeGen/RegAllocGreedy.cpp
+++ b/lib/CodeGen/RegAllocGreedy.cpp
@@ -842,7 +842,7 @@ bool RAGreedy::shouldEvict(LiveInterval &A, bool IsHint,
if (CanSplit && IsHint && !BreaksHint)
return true;
- if (A.weight > B.weight) {
+ if (A.weight > B.weight && ((A.weight - B.weight) > 0.001)) {
LLVM_DEBUG(dbgs() << "should evict: " << B << " w= " << B.weight << '\n');
return true;
}
This made all the evictions for this test case to never happen in the first
place, and the result was a general improvement, as expected with more folded
operands.
Since checking specialized cases for eviction is quite a task, I wonder if my
experiment could be useful somehow as a simplistic improvement? I wonder if
there is a limit (or level of similarity) here where it doesn't make sense to
go through the trouble of evicting one live range for the other?
Either way, this indicates further as Uli suggested that these evictions are
bad and should not happen.
Currently splitCanCauseEvictionChain identifies 2 very specific scenarios I've
encountered, the scenario you're seeing here is probably not one of them.
From what I see in the log both the "before" and "after" code seem to have a
bad eviction chain (the chain of copies) that moved from one place to another
in %bb.2 due to the changes in the memory folding.
I hope to look into this a bit deeper in the next few days and identify what is
the exact scenario that caused this chain.
I fear the "shouldEvict" heuristic you suggested may be too intrusive and
affects many other decisions. We must first pin point the scenario and then
decide how to avoid it.
Regarding the difference for vreg %129 the code "before" could not find a way
to split for $r31 because of previous assignments, so it chose to do a split
that tries to create compact regions and not "fit" it the best way it can to
the "empty" parts of a $r31 (more accurately, splitting for $r31 wasn't
profitable).
In the "after" code the assignments to $r31 have changed, and now there is a
way to split %129 to fit $r31 (in a profitable way). When Greedy tries to
estimate the split cost it checks if the split for $r31 is one of the known
"bad" scenarios, but your root cause is probably different, this is why it says
"Best split candidate of vreg %129 may not cause bad eviction chain".
The best way to approach it is to identify the first "bad" COPY created, which
seems to be the decision to split vreg %99.
It creates %136 and %137 split artifacts for the "before" code and %135 and
%136 split artifacts for the "after" code.
Then we need to investigate the chain affect that happened here and understand
how we can better influence this decision.
I'll keep you updated in the next few days with what I'm able to find.
Any news on this, Marina?
Unfortunately I did not have a chance to look at the bug in-deep yet.
I hope to find the time next week.
Hi Marina, I don't want to rush you or anything, but just give a friendly reminder that we are looking forward to resolving this issue... /Jonas
Hi Jonas,
Sorry for not being able to address this issue yet. Usually these RA issues are
quite complex and take some time to figure out in depth. I will resume looking
at this over the weekend, but unfortunately my current main focus is a non llvm
related project, therefore, it may take a while until I can upload a patch that
identifies this bad eviction scenario.
There's another way to partially address such issues - you can add a cleanup
pass.
At one time Quentin Colombet suggested doing a cleanup of such copies in the
MachineCopyPropagation pass, and this may be indeed a good idea to do
regardless of improving RA. This does not solve RA's bad decisions, but if
these copies are contained in a single basic block, they can be cleaned up
there.
Thanks,
Marina
Hi Marina,
I certainly understand the complexity of this issue!
I wouldn't mind trying to improve the MachineCopyPropagation pass, except I worry that even if I have a working patch it will get rejected with the reasoning that "RegAlloc should handle this". So before trying anything with that, I think we should have a general agreement that the test case is such that we don't expect RegAlloc to always succeed with it...
In this case it actually looks a bit like RegAlloc should handle this better, or?
Jonas, can you try https://reviews.llvm.org/D50064 to see whether it helps with
your case? The patch is not reviewed yet, but it can mitigate some of the bad
eviction decisions that were fixed before in RA e.g:
renamable $rbx = COPY renamable killed $rax
renamable $rcx = COPY renamable killed $rbx
renamable $rax = COPY renamable killed $rcx
NOOP implicit $rax
=>
NOOP implicit $rax
Ugh, those COPYs are in different basic blocks; So current MachineCopyPropagation pass cannot do anything here, because it is BB-local only
(In reply to Alexander Ivchenko from comment #11)
> Jonas, can you try https://reviews.llvm.org/D50064 to see whether it helps
> with your case? The patch is not reviewed yet, but it can mitigate some of
> the bad eviction decisions that were fixed before in RA e.g:
>
> renamable $rbx = COPY renamable killed $rax
> renamable $rcx = COPY renamable killed $rbx
> renamable $rax = COPY renamable killed $rcx
> NOOP implicit $rax
> =>
> NOOP implicit $rax
Hi Alexander, thanks for suggesting this, but this unfortunately did not seem
to help my test case (no improvement). /Jonas(In reply to Alexander Ivchenko
from comment #12)
> Ugh, those COPYs are in different basic blocks; So current
> MachineCopyPropagation pass cannot do anything here, because it is BB-local
> only
Hi Alexander,
I was just about to say thanks for suggesting this, but this unfortunately did
not help my test case, as you predicted.
/Jonas
ping. Anyone has the time for this?
ping! The same test case still shows the same problem...
I might be able to look at it next week, unless Marina beats me at it.
It'll at to wait one or two more weeks :(
Hi all,
I finally had a look, but I am puzzled as I am not sure what to look at!
The compiler with and without the attached patch produces the exact same assembly for test/CodeGen/SystemZ/fold-memory-op-impl-02.ll and none of them end up producing asi.
Failing Tests (1):
LLVM :: CodeGen/SystemZ/fold-memory-op-impl-02.ll
Unexpected Failures: 1
Does the patch need to be rebased?
Attached latest.patch
(7122 bytes, text/plain): patch: ASI enabled for AHIMux + the test case
Attached AHIMuxK_foldmem.patch
(7658 bytes, text/plain): Updated patch for AHIMuxK instruction
ping
It would still be nice if someone with regalloc greedy know-how could address
this strange regression. I see on SystemZ/SPEC'17 that this patch does have a
value if applied:
asi : 13226 13686 +460
ahi : 97161 96712 -449
l : 229532 229086 -446
st : 177344 176900 -444
I'll try to take a look at this one this week or next week.
Attached AHIMuxK_foldmem.patch
(7693 bytes, text/plain): rebased patch with test case
Attached foldMemImp.patch
(7684 bytes, text/plain): rebased patch with test case
TL;DR Didn't find anything useful yet!
Quick update here.
The fact that we get different allocation is expected given the folding creates
different live-intervals and thus, changes the order in which things are
processed.
Looking at the resulting code, we indeed seem to hit a case of eviction chain
but I haven't dug enough to see what we should do to have the mechanism that
Marina added to kick in.
At first glance, it seems the live-intervals don't have the shape that is
considered to create eviction chain. In particular, it seems the candidates for
split are never both live-in *and* live-out of a bundle.
(In reply to Marina Yatsina from comment #5)
> Currently splitCanCauseEvictionChain identifies 2 very specific scenarios
> I've encountered, the scenario you're seeing here is probably not one of
> them.
I think this is exactly what is happening. I think Marina's heuristic detects
thing with copy that goes back and forth:
dst1 = src1
dst2 = src2
dst3 = src3
...
src3 = dst3
src2 = dst2
src1 = dst1
Whereas in that case, the copies are set to join live ranges:
then:
dst1 = src1
dst2 = src2
dst3 = src3
br end
else:
dst1 = src1
dst2 = src2
dst3 = src3
br end
end:
use of dst1, dst2, dst3
At this point, I would need to spend some time getting into Marina's heuristic
to see how we can fix this. I unfortunately don't have the time for that now.
Marina, if you can take a look that would be great.
latest.patch
(7122 bytes, text/plain)asidiff_trim
(811102 bytes, text/plain)AHIMuxK_foldmem.patch
(7658 bytes, text/plain)AHIMuxK_foldmem.patch
(7693 bytes, text/plain)foldMemImp.patch
(7684 bytes, text/plain)