Open Quuxplusone opened 7 years ago
Bugzilla Link | PR34472 |
Status | NEW |
Importance | P enhancement |
Reported by | Qirun Zhang (helloqirun@gmail.com) |
Reported on | 2017-09-05 00:36:51 -0700 |
Last modified on | 2019-07-10 16:10:31 -0700 |
Version | trunk |
Hardware | PC All |
CC | dberlin@dberlin.org, ditaliano@apple.com, florian_hahn@apple.com, llvm-bugs@lists.llvm.org, nick@wasmer.io, su@cs.ucdavis.edu |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also | PR34935, PR37121, PR36500 |
Thanks. This doesn't have undef so it should just resolve.
I must have added a bug somewhere.
; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-49ea412.bc"
target triple = "x86_64-apple-darwin16.7.0"
define void @hoge() {
bb:
br label %bb1
bb1: ; preds = %bb12, %bb
%tmp = phi i8 [ %tmp3, %bb12 ], [ 0, %bb ]
br i1 undef, label %bb2, label %bb13
bb2: ; preds = %bb4, %bb1
%tmp3 = phi i8 [ %tmp, %bb1 ], [ %tmp5, %bb4 ]
br i1 undef, label %bb12, label %bb4
bb4: ; preds = %bb11, %bb2
%tmp5 = phi i8 [ %tmp3, %bb2 ], [ %tmp7, %bb11 ]
br i1 false, label %bb6, label %bb2
bb6: ; preds = %bb9, %bb4
%tmp7 = phi i8 [ %tmp5, %bb4 ], [ %tmp10, %bb9 ]
br i1 undef, label %bb11, label %bb8
bb8: ; preds = %bb8, %bb6
br i1 true, label %bb8, label %bb9
bb9: ; preds = %bb15, %bb8
%tmp10 = phi i8 [ %tmp16, %bb15 ], [ %tmp7, %bb8 ]
br label %bb6
bb11: ; preds = %bb6
br label %bb4
bb12: ; preds = %bb2
br label %bb1
bb13: ; preds = %bb1
br i1 undef, label %bb14, label %bb15
bb14: ; preds = %bb13
br label %bb15
bb15: ; preds = %bb14, %bb13
%tmp16 = phi i8 [ %tmp, %bb13 ], [ 9, %bb14 ]
br label %bb9
}
It looks like the same issue, but it provides another reproducer.
$ clangpolly -v
clang version 6.0.0 (http://llvm.org/git/clang.git
5a2b037a53684751a712b11898a0cc69815f2ba7) (http://llvm.org/git/llvm.git
76db91a4f033defd405dfb4d14ba6c0c7e205ce0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/su/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/4.9.4
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/5.3.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.3.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.2.0
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
$
$ clangpolly -O3 -mllvm -disable-llvm-optzns -c -emit-llvm -o small.bc small.c
$ optpolly -gvn -newgvn -o small-opt.bc small.bc
optpolly:
/home/su/software/tmp/polly/llvm/lib/Transforms/Scalar/NewGVN.cpp:2853: void
{anonymous}::NewGVN::updateProcessedCount(const llvm::Value*): Assertion
`ProcessedCount[V] < 100 && "Seem to have processed the same Value a lot"'
failed.
#0 0x00000000038ad6f4 (optpolly+0x38ad6f4)
#1 0x00000000038ad785 (optpolly+0x38ad785)
#2 0x00000000038abb9a (optpolly+0x38abb9a)
#3 0x00000000038ad077 (optpolly+0x38ad077)
#4 0x00007f0245cb0330 __restore_rt (/lib/x86_64-linux-
gnu/libpthread.so.0+0x10330)
#5 0x00007f0244a98c37 gsignal /build/eglibc-SvCtMH/eglibc-
2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00007f0244a9c028 abort /build/eglibc-SvCtMH/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00007f0244a91bf6 __assert_fail_base /build/eglibc-SvCtMH/eglibc-
2.19/assert/assert.c:92:0
#8 0x00007f0244a91ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x00000000036ee752 (optpolly+0x36ee752)
#10 0x00000000036f1227 (optpolly+0x36f1227)
#11 0x00000000036f1b75 (optpolly+0x36f1b75)
#12 0x00000000036f5310 (optpolly+0x36f5310)
#13 0x0000000003190d6a (optpolly+0x3190d6a)
#14 0x0000000003190efd (optpolly+0x3190efd)
#15 0x0000000003191298 (optpolly+0x3191298)
#16 0x00000000031919ef (optpolly+0x31919ef)
#17 0x0000000003191c2f (optpolly+0x3191c2f)
#18 0x00000000017fcbe8 (optpolly+0x17fcbe8)
#19 0x00007f0244a83f45 __libc_start_main /build/eglibc-SvCtMH/eglibc-
2.19/csu/libc-start.c:321:0
#20 0x00000000017d64f9 (optpolly+0x17d64f9)
Stack dump:
0. Program arguments: optpolly -gvn -newgvn -o small-opt.bc small.bc
1. Running pass 'Function Pass Manager' on module 'small.bc'.
2. Running pass 'Global Value Numbering' on function '@fn2'
Aborted (core dumped)
$
----------------------------------------
int b, c, d;
int fn1 () { return 0; }
int fn2 ()
{
int f;
L:
{
int h;
if (!b)
{
{
int i = 0;
for (; 1; )
return i;
}
while (c)
h = f;
while (d)
h = f;
}
else
goto L;
}
f = fn1 ();
goto L;
}
int main ()
{
fn2 ();
return 0;
}
Okay, so, trying to understand this to see what the end result really should be
:)
First, bb8->bb9 is unreachable, and bb4->bb6 is unreachable.
That gives tmp7 = tmp10 = tmp16
That immediately gives:
bb1: ; preds = %bb12, %bb
%tmp = phi i8 [ %tmp3, %bb12 ], [ 0, %bb ]
br i1 undef, label %bb2, label %bb13
bb2: ; preds = %bb4, %bb1
%tmp3 = phi i8 [ 0, %bb1 ], [ %tmp5, %bb4 ]
br i1 undef, label %bb12, label %bb4
bb4: ; preds = %bb11, %bb2
%tmp5 = phi i8 [ %tmp3, %bb2 ], [ %tmp16, %bb11 ]
br i1 false, label %bb6, label %bb2
bb6: ; preds = %bb9, %bb4
%tmp7 = phi i8 [ %tmp16, %bb9 ]
br i1 undef, label %bb11, label %bb8
bb8: ; preds = %bb8, %bb6
br i1 true, label %bb8, label %bb9
bb9: ; preds = %bb15, %bb8
%tmp10 = phi i8 [ %tmp16, %bb15 ]
br label %bb6
bb11: ; preds = %bb6
br label %bb4
bb12: ; preds = %bb2
br label %bb1
bb13: ; preds = %bb1
br i1 undef, label %bb14, label %bb15
bb14: ; preds = %bb13
br label %bb15
bb15: ; preds = %bb14, %bb13
%tmp16 = phi i8 [ %tmp, %bb13 ], [ 9, %bb14 ]
I'm thinking about what i believe the other values should be. They are either
0, 9, or 0|9
The RPO order is :
Block bb is 0 in RPO order.
Block bb1 is 1 in RPO order.
Block bb13 is 2 in RPO order.
Block bb14 is 3 in RPO order.
Block bb15 is 4 in RPO order.
Block bb2 is 5 in RPO order.
Block bb4 is 6 in RPO order.
Block bb6 is 7 in RPO order.
Block bb8 is 8 in RPO order.
Block bb9 is 9 in RPO order.
Block bb11 is 10 in RPO order.
Block bb12 is 11 in RPO order.
bb8->bb9 is unreachable, and bb4->bb6 is unreachable. Just removing the
unreachables, we have:
bb1: ; preds = %bb12, %bb
%tmp = phi i8 [ %tmp3, %bb12 ], [ 0, %bb ]
br i1 undef, label %bb2, label %bb13
bb2: ; preds = %bb4, %bb1
%tmp3 = phi i8 [ 0, %bb1 ], [ %tmp5, %bb4 ]
br i1 undef, label %bb12, label %bb4
bb4: ; preds = %bb11, %bb2
%tmp5 = phi i8 [ %tmp3, %bb2 ], [ %tmp7, %bb11 ]
br i1 false, label %bb6, label %bb2
bb6: ; preds = %bb9, %bb4
%tmp7 = phi i8 [ %tmp10, %bb9 ]
br i1 undef, label %bb11, label %bb8
bb8: ; preds = %bb8, %bb6
br i1 true, label %bb8, label %bb9
bb9: ; preds = %bb15, %bb8
%tmp10 = phi i8 [ %tmp16, %bb15 ]
br label %bb6
bb11: ; preds = %bb6
br label %bb4
bb12: ; preds = %bb2
br label %bb1
bb13: ; preds = %bb1
br i1 undef, label %bb14, label %bb15
bb14: ; preds = %bb13
br label %bb15
bb15: ; preds = %bb14, %bb13
%tmp16 = phi i8 [ %tmp, %bb13 ], [ 9, %bb14 ]
The RPO algorithm should go like this:
(there are no expression lookups that will ever find anything, so this should
be the same between the two value numbering algorithms)
Iteration 1
VN(tmp) = phi (VN(tmp3) == TOP, 0) == 0
VN(tmp16) = phi(VN(tmp) == 0, 9) == tmp16
VN(tmp3) = phi(0, VN(tmp5) == TOP) == 0
VN(tmp5) = phi(VN(tmp3) == 0, VN(tmp7) == TOP) == 0
VN(tmp7) = phi(VN(tmp10) = TOP = TOP
VN(tmp10) = phi(VN(tmp16) = tmp16 = tmp16
end of iteration 1 state
VN(tmp) = 0
VN(tmp16) = tmp16
VN(tmp3) = 0
VN(tmp5) = 0
VN(tmp7) = TOP
VN(tmp10 = tmp16
Iteration 2
VN(tmp) = phi(VN(tmp3) == 0, 0) = 0
VN(tmp16) = phi(VN(tmp) == 0), 9) = tmp16
VN(tmp3) = phi(0, VN(tmp5) == 0) = 0
VN(tmp5) = phi(VN(tmp3) == 0, VN(tmp7) == TOP) = 0
VN(tmp7) = phi(VN(tmp10) == tmp16) = tmp16
VN(tmp10) = phi(VN(tmp16) == tmp16) = tmp16
End of iteration 2 state
VN(tmp) = 0
VN(tmp16) = tmp16
VN(tmp3) = 0
VN(tmp5) = 0
VN(tmp7) = tmp16
VN(tmp10) = tmp16
Iteration 3:
VN(tmp) = phi(VN(tmp3) == 0, 0) = 0
VN(tmp16) = phi(VN(tmp) == 0), 9) = tmp16
VN(tmp3) = phi(0, VN(tmp5) == 0) = 0
VN(tmp5) = phi(VN(tmp3) == 0, VN(tmp7) == tmp16) = tmp5
VN(tmp7) = phi(VN(tmp10) == tmp16) = tmp16
VN(tmp10) = phi(VN(tmp16) == tmp16) = tmp16
End state of iteration 3:
VN(tmp) = 0
VN(tmp16) = tmp16
VN(tmp3) = 0
VN(tmp5) = tmp5
VN(tmp7) = tmp16
VN(tmp10) = tmp16
Iteration 4:
VN(tmp) = phi(VN(tmp3) == 0, 0) = 0
VN(tmp16) = phi(VN(tmp) == 0), 9) = tmp16
VN(tmp3) = phi(0, VN(tmp5) == tmp5) = tmp3
VN(tmp5) = phi(VN(tmp3) == tmp3, VN(tmp7) == tmp16) = tmp5
VN(tmp7) = phi(VN(tmp10) == tmp16) = tmp16
VN(tmp10) = phi(VN(tmp16) == tmp16) = tmp16
end state of iteration 4:
VN(tmp) = 0
VN(tmp16) = tmp16
VN(tmp3) = tmp3
VN(tmp5) = tmp5
VN(tmp7) = tmp16
VN(tmp10) = tmp16
Iteration 5:
VN(tmp) = phi(VN(tmp3) == tmp3, 0) = tmp
VN(tmp16) = phi(VN(tmp) == tmp), 9) = tmp16
VN(tmp3) = phi(0, VN(tmp5) == tmp5) = tmp3
VN(tmp5) = phi(VN(tmp3) == tmp3, VN(tmp7) == tmp16) = tmp5
VN(tmp7) = phi(VN(tmp10) == tmp16) = tmp16
VN(tmp10) = phi(VN(tmp16) == tmp16) = tmp16
end state of iteration 5:
VN(tmp) = tmp
VN(tmp16) = tmp16
VN(tmp3) = tmp3
VN(tmp5) = tmp5
VN(tmp7) = tmp16
VN(tmp10) = tmp16
Iteration 6:
VN(tmp) = phi(VN(tmp3) == tmp3, 0) = tmp
VN(tmp16) = phi(VN(tmp) == tmp), 9) = tmp16
VN(tmp3) = phi(0, VN(tmp5) == tmp5) = tmp3
VN(tmp5) = phi(VN(tmp3) == tmp3, VN(tmp7) == tmp16) = tmp5
VN(tmp7) = phi(VN(tmp10) == tmp16) = tmp16
VN(tmp10) = phi(VN(tmp16) == tmp16) = tmp16
end state of iteration 5:
VN(tmp) = tmp
VN(tmp16) = tmp16
VN(tmp3) = tmp3
VN(tmp5) = tmp5
VN(tmp7) = tmp16
VN(tmp10) = tmp16
Nothing has changed, this state should be the stable end state.
We should reach the same result.
I actually believe we should come to the same result in the same order, i will
see where we are going wrong.
Processing instruction %tmp7 = phi i8 [ %tmp5, %bb4 ], [ %tmp10, %bb9 ]
bb4 -> bb6 is unreachable
No arguments of PHI node %tmp7 = phi i8 [ %tmp5, %bb4 ], [ %tmp10, %bb9 ] are
live
This is wrong overall, it should end up as tmp16.
So, the bad news is i can prove that with one of the optimizations we do, the algorithm may not terminate.
I am going to disable that for now, and then i'll bring it back another way i can prove terminates, and then see how it could be integrated into the algorithm.
In particular, to write down some thoughts:
The RPO algorithm only has things fall down the lattice, essentially.
Partitions only split, things only really become less equivalent. Our
algorithm essentially enables partition re-merges. In the RPO world, it's easy
to guarantee termination by proving things only go in one direction on the
lattice. In our world, termination is much harder to guarantee.
This means the RPO algorithm will not always produce optimal answers,
particularly if you try to merge it with unreachable code elimination.
Example:
define void @hoge() {
bb:
br label %bb1
bb1: ; preds = %bb12, %bb
%tmp = phi i32 [ 0, %bb ], [ 1, %bb12 ]
br label %bb2
bb2: ; preds = %bb1
br label %bb3
bb3: ; preds = %bb8, %bb2
%tmp4 = phi i32 [ %tmp, %bb2 ], [ %tmp9, %bb8 ]
br i1 undef, label %bb12, label %bb5
bb5: ; preds = %bb10, %bb3
%tmp6 = phi i32 [ %tmp11, %bb10 ], [ %tmp4, %bb3 ]
br label %bb7
bb7: ; preds = %bb5
br i1 undef, label %bb10, label %bb8
bb8: ; preds = %bb7
%tmp9 = phi i32 [ %tmp6, %bb7 ]
br label %bb3
bb10: ; preds = %bb7
%tmp11 = phi i32 [ %tmp6, %bb7 ]
br label %bb5
bb12: ; preds = %bb3
br label %bb1
}
RPO order is file order (for our purposes)
RPO with all edges live:
Iteration 1:
VN(tmp) = tmp
VN(tmp4) = VN(tmp)=tmp, VN(tmp9)=TOP == tmp
VN(tmp6) = VN(tmp11)=TOP, VN(tmp4)=tmp == tmp
VN(tmp9) = VN(tmp6) == tmp
VN(tmp11) = VN(tmp6) = tmp
End state:
VN(everything) = tmp :)
This will remain the stable state next iteration.
If we ignore currently unreachable edges (bb12->bb1) to start, we get this:
Iteration 1:
VN(tmp) = 0
VN(tmp4) = VN(0)=0, VN(tmp9)=TOP == 0
VN(tmp6) = VN(tmp11)=TOP, VN(tmp4)=0 == 0
VN(tmp9) = VN(tmp6) == 0
VN(tmp11) = VN(tmp6) = 0
SO far so good
Now we've proved bb12->bb1 is reachable.
Iteration 2:
VN(tmp) = VN(0, 1) == tmp
VN(tmp4) = VN(tmp)=tmp, VN(tmp9)=0 == tmp4
VN(tmp6) = VN(tmp11)=0, VN(tmp4)=tmp4 == tmp6
VN(tmp9) = VN(tmp6) == tmp6
VN(tmp11) = VN(tmp6) = tmp6
Iteration 3:
VN(tmp) = VN(0, 1) == tmp
VN(tmp4) = VN(tmp)=tmp, VN(tmp9)=tmp6 == tmp4
VN(tmp6) = VN(tmp11)=tmp6, VN(tmp4)=tmp4 == tmp6
VN(tmp9) = VN(tmp6) == tmp6
VN(tmp11) = VN(tmp6) = tmp6
This is the stable end state.
Whoops!
Currently, we are performing an optimization that enables us to eliminate these
cycles and get the same answer as the non-unreachable code version.
Unfortunately, i can prove it is not terminating.
This leaves us with a few options:
1. Get the same "bad" answer as the RPO algorithm does after you start out by
ignoring some edges.
2. Build a separate phi optimization algorithm we run on each iteration to
destroy cycles.
3. Try some other approaches.
For example, we could reset all phi nodes in the same scc to TOP and keep going.
Because our edges only get *more* reachable, this should suffice to terminate.
I am going to experiment, but likely to start with option #1 if it starts
taking any real time to experiment.
probably the same (or similar)
https://bugs.llvm.org/show_bug.cgi?id=34935
FWIW: I have a half-done patch to do the proper scc finding and invalidation to
keep this optimization if anyone wants it.
Otherwise, i'd suggest just disabling the self-lookup optimization for now. It
can be proven not to converge without more.
Replace return lookupOperandLeader(P.first) != I with return true to make this
happen
Thanks Daniel!
FWIW: I have a half-done patch to do the proper scc finding and invalidation to keep this optimization if anyone wants it.
I would be happy to take a look! Otherwise I'll prepare a patch to disable the optimization for now!