Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[SimplifyCFG] Sinking and BB folding fight each other and the pass doesn't reach a fixpoint. #34219

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR35246
Status NEW
Importance P enhancement
Reported by Jonas Paulsson (paulsson@linux.vnet.ibm.com)
Reported on 2017-11-08 06:01:07 -0800
Last modified on 2017-11-08 16:48:18 -0800
Version trunk
Hardware PC Linux
CC dberlin@dberlin.org, ditaliano@apple.com, efriedma@quicinc.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, paulsson@linux.vnet.ibm.com, spatel+llvm@rotateright.com, uweigand@de.ibm.com
Fixed by commit(s)
Attachments tc_hang.ll (3700 bytes, text/plain)
crash1.bc (412120 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 19385
reduced testcase

Csmith discovered a program where clang just hangs at -O3, but runs at -O1.

The problem seems to be in "Simplify the CFG".

bin/clang -cc1 -triple s390x-ibm-linux -S -target-cpu z13 -v -O3 -w -emit-llvm -
x ir tc_hang.ll
Quuxplusone commented 6 years ago

Attached tc_hang.ll (3700 bytes, text/plain): reduced testcase

Quuxplusone commented 6 years ago

Attached crash1.bc (412120 bytes, application/octet-stream): csmith original .bc input

Quuxplusone commented 6 years ago
@global = external local_unnamed_addr global [2 x i32], align 4
@global.1 = external local_unnamed_addr global i32*, align 8

define void @patatino(i32 %tinky) {
bb:
  %tmp3 = icmp ne i32 %tinky, 0
  %tmp6 = zext i1 %tmp3 to i32
  %tmp7 = load i32, i32* getelementptr inbounds ([2 x i32], [2 x i32]* @global, i64 0, i64 1), align 4
  %tmp8 = icmp sgt i32 %tmp7, %tmp6
  br i1 %tmp8, label %bb9, label %bb11

bb9:
  %tmp = load i32*, i32** @global.1, align 8
  %tmp10 = load i32, i32* %tmp, align 4
  br label %bb11

bb11:
  %tmp12 = phi i32 [ %tmp10, %bb9 ], [ %tinky, %bb ]
  %tmp13 = phi i32 [ -1, %bb9 ], [ %tmp7, %bb ]
  br label %bb14

bb14:
  %tmp15 = phi i32 [ 0, %bb14 ], [ %tmp12, %bb11 ]
  %tmp16 = phi i32 [ %tmp17, %bb14 ], [ %tmp13, %bb11 ]
  %tmp17 = load i32, i32* getelementptr inbounds ([2 x i32], [2 x i32]* @global, i64 0, i64 1), align 4
  br label %bb14
}
Quuxplusone commented 6 years ago
$ ./opt red2.ll -simplifycfg -keep-loops=false -o /dev/null
^C
$
Quuxplusone commented 6 years ago
The debug shows this loops indefinitely in this cycle:

bb14.sink.split:                                  ; preds = %bb14, %bb9
  %tmp15.ph = phi i32 [ %tmp10, %bb9 ], [ 0, %bb14 ]
  %tmp16.ph = phi i32 [ -1, %bb9 ], [ %tmp17, %bb14 ]
  br label %bb14
SINK: instruction can be sunk:   %tmp10 = load i32, i32* %tmp, align 4
SINK: #phid values: 2
SINK: Splitting edge
SINK: Sink:   %tmp10 = load i32, i32* %tmp, align 4
SINK: #phid values: 2
Looking to fold bb14.sink.split into bb14
Killing Trivial BB:
Quuxplusone commented 6 years ago
This is kind of a tragedy of commons.
The sinking logic in SimplifyCFG is sinking the load %tmp10 (and splits one
edge), and TryToSimplifyUncondBranchFromEmptyBlock() decides to do the exact
opposite, so you end up with the original IR, and sinking kicks in -> infinite
loop.

I'll think about how to fix this without losing computational power.
(Also cc:ing some people who reviewed/touched SimplifyCFG recently as they
might have opinions).
Quuxplusone commented 6 years ago
(In reply to Davide Italiano from comment #5)
> This is kind of a tragedy of commons.
> The sinking logic in SimplifyCFG is sinking the load %tmp10 (and splits one
> edge), and TryToSimplifyUncondBranchFromEmptyBlock() decides to do the exact
> opposite, so you end up with the original IR, and sinking kicks in ->
> infinite loop.
>
> I'll think about how to fix this without losing computational power.
> (Also cc:ing some people who reviewed/touched SimplifyCFG recently as they
> might have opinions).

s/computational/optimization/ power