Open Quuxplusone opened 7 years ago
Bugzilla Link | PR31613 |
Status | REOPENED |
Importance | P normal |
Reported by | Davide Italiano (ditaliano@apple.com) |
Reported on | 2017-01-11 16:15:23 -0800 |
Last modified on | 2021-10-01 09:47:38 -0700 |
Version | trunk |
Hardware | PC Windows NT |
CC | aeubanks@google.com, alina.sbirlea@gmail.com, dberlin@dberlin.org, florian_hahn@apple.com, llvm-bugs@lists.llvm.org, nikita.ppv@gmail.com, ruiling.song83@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | PR30995 |
Blocked by | |
See also |
This is actually one of the reasons i originally disabled phi equivalence to
other phis in different blocks on the branch.
However, now i think i can fix this.
The real issue here is that we expect to never come across a class where the
leader is dominated by us.
This is because we iterate in dominator/RPO order, and the thinking goes we
should have created classes in order such that leaders should be the most
dominating.
This means the member sets should be sorted in order of RPO, such that when you
change leaders, you move to the next dominating thing.
However, our member sets are not sorted.
Which means when we switch the leader, we need to pick the earliest leader.
Otherwise, like in this testcase, because we never choose a canonical leader,
we cycle through all the congruences repeatedly.
Fix coming
r291968 fixed this one.
Reopening, as I'm hitting this assertion again in a different testcase.
To reproduce:
opt -passes=mldst-motion,newgvn test.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-
n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"
define hidden void @barrier() align 2 {
entry:
%0 = load i64, i64* null, align 8
%call9 = tail call i64 @g()
%rem = select i1 undef, i64 0, i64 %call9
%add = add i64 %0, 1
%shr17 = lshr i64 %add, 1
%sub = add nsw i64 %shr17, -1
br label %for.cond
for.cond.loopexit.loopexit: ; preds = %cleanup.cont
br label %for.cond.loopexit
for.cond.loopexit: ; preds = %for.cond,
%for.cond.loopexit.loopexit
br label %for.cond
for.cond: ; preds = %for.cond.loopexit,
%entry
%__current.0 = phi i64 [ %rem, %entry ], [ 0, %for.cond.loopexit ]
%cmp222 = icmp eq i64 %__current.0, %sub
br i1 %cmp222, label %land.lhs.true.preheader, label %for.cond.loopexit
land.lhs.true.preheader: ; preds = %for.cond
br label %land.lhs.true
land.lhs.true: ; preds = %cleanup.cont,
%land.lhs.true.preheader
%__current.13 = phi i64 [ %inc, %cleanup.cont ], [ %__current.0, %land.lhs.true.preheader ]
br label %cleanup.cont
cleanup.cont: ; preds = %land.lhs.true
%inc = add i64 %__current.13, 1
%cmp22 = icmp eq i64 %inc, %sub
br i1 %cmp22, label %land.lhs.true, label %for.cond.loopexit.loopexit
}
declare hidden i64 @g() local_unnamed_addr align 2
Just and informative update.
Due to running MergedLoadStoreMotion first, when initializing NewGVN, PredInfo
changes thee for.cond block to:
for.cond: ; preds = %for.cond.loopexit,
%entry
%__current.0 = phi i64 [ %rem, %entry ], [ 0, %for.cond.loopexit ]
%cmp222 = icmp eq i64 %__current.0, %sub
%__current.0.0 = call i64 @llvm.ssa.copy.i64(i64 %__current.0)
%sub.0 = call i64 @llvm.ssa.copy.i64(i64 %sub)
br i1 %cmp222, label %land.lhs.true.preheader, label %for.cond.loopexit
Using the information in PredInfo, NewGVN is assigning the two copies to the
same congruence class, even though %__current.0 and %sub are in different
congruence classes.
Without MergedLoadStoreMotion, or running NewGVN directly on the updated IR
with the ssa.copy added, the bug is not triggered; the two ssa.copy calls are
placed in different congruence classes, correctly corresponding to %__current.0
and %sub.
Cannot tell yet which cached information leads to this behavior.
@asbirlea: For your test case, I'm seeing the assertion failure with just -
newgvn, even without -mldst-motion. Also for this slightly reduced variant:
define void @barrier(i64 %arg) {
entry:
br label %loop1
loop1:
%phi1 = phi i64 [ %arg, %entry ], [ 0, %loop1 ]
%cmp1 = icmp eq i64 %phi1, -1
br i1 %cmp1, label %loop2, label %loop1
loop2:
%phi2 = phi i64 [ %inc, %loop2 ], [ %phi1, %loop1 ]
%inc = add i64 %phi2, 1
%cmp2 = icmp eq i64 %inc, -1
call void @llvm.assume(i1 %cmp2)
br label %loop2
}
declare void @llvm.assume(i1)
Wanted to add a description of what I'm seeing in the latest testcases and I
posted https://reviews.llvm.org/D110907 with a proposed solution. I'm not
confident this is the right approach, so I'd appreciate some input.
In the testcase from Nikita, the optimization goes into an infinite loop due to
the contradiction between the value inferred from symbolic execution in the
loop and the assume instruction.
For each ssa.copy intrinsic, there are two sources for building an expression,
both a Values: first, from the newgvn inference until that point and second
from PredicateInfo. These two values may be swapped depending on their ranks,
so the second is chosen if it has a lower rank. The value entering the loop is
0, a constant, so there is no swapping on the first iteration. On the second
iteration, the Phi is given two different values, so it's updated to a
variable, but the info from PredicateInfo is constant: -1, so the values are
swapped and the -1 considered the best value inferred. Using this value (-1),
on the next iteration, the phi is inferred to be 0 again, and since it's a
constant, there is no swapping done, leading back to the same state as in the
first iteration. The optimization ping-pongs between these states until hitting
the assertion.
In the testcase I provided, there are no assume instructions, but the issue is
also when evaluating the ssa.copy intrinsic based on information from
PredicateInfo. On the first iteration, the last ssa.copy instruction has Value1
a phi and Value2 the sub instruction from outside the loop, due to
PredicateInfo (I'm not clear how this information is inferred inside
PredicateInfo itself). So the values are swapped and the second Value is used
to build the expression for the ssa.copy. On the next iteration, the symbolic
execution finds the lshr on both call paths, which has a lower rank, so the
first value is chosen. Third iteration gets the same behavior as the first,
hence the infinite loop.
In both cases, the problem appears to be an inconsistency between values from
symbolic execution and values from PredicateInfo.
The solution I have opted for is to consider the PredicateInfo the primary
source of truth if it was chosen once before. If the info from PredicateInfo
changes, so does the latest Value used for swapping.
I'm not confident however that there cannot be a case where the inferred value
(value1) is a better option than the PredicateInfo (value2), after value2 was
better than a previously inferred value1; this optimization would be missed
with this patch. The current unit tests are not affected by the change.
As I mentioned before, feedback welcome here.
The base level thing that should happen when there are two possible values for
something is that it should just mark it varying.
What's really happening here is that it knows there is only one possible value,
but it changes it's mind as to what it is :-)
Most of the time, that is just a bug in the symbolic execution engine - at
least when I was working on it, the main cause was the non-local
assumptions/etc that got made in some of the helpers when trying to do
simplification/etc. So you'd get different answers for the same value
depending on some random context, and because of where it happened in the
algorithm, it didn't discover there are multiple values and give up.
That "give up" approach loses optimization, of course, in the case that there
really are two valid, different, constant values for something, and either is
*really* okay.
I have to imagine that's pretty rare.
But in that case, your approach of "choose one consistently" is correct.
But let me just emphasize what you already know - you have to be careful of
*always* choosing consistently - i.e. guarantee it will always come to the same
result.
Otherwise, as you've seen, you risk non-termination.
This is hard overall - it often requires that we guarantee the things we query
discover "maximal" answers.