Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

wrong code after "opt -mem2reg -jump-threading -loop-unswitch -ipsccp -loop-extract-single -functionattrs" #40481

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41511
Status NEW
Importance P enhancement
Reported by Zhide Zhou (cszide@163.com)
Reported on 2019-04-15 23:21:07 -0700
Last modified on 2020-04-26 12:05:25 -0700
Version trunk
Hardware PC Linux
CC ehudkatz@gmail.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, mikael.holmen@ericsson.com
Fixed by commit(s)
Attachments small.bc (3200 bytes, application/octet-stream)
small-opt1.bc (3872 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 21784
.bc file of the source code

$clang -v
clang version 9.0.0 (trunk 355281)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jack-zhou/Documents/llvm/llvm_truck/llvm/build4/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.3.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.3.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

For the following code small.c, after running "opt -mem2reg -jump-threading -
loop-unswitch -ipsccp -loop-extract-single -functionattrs", lli and clang have
different outputs.
-----------------------------------------------------------
#include <stdio.h>
int a, b;
int main() {

c:
  for (; a <= 1; a++) {
    int d;
    if (d)
      goto c;
  }
  printf("%d\n",a);
}
-----------------------------------------------------------
$clang small.c -o small1.out && ./small1.out
2
$gcc small.c -o small2.out && ./small2.out
2

$clang -O3 -c -emit-llvm  -mllvm -disable-llvm-optzns small.c -o small.bc

$opt -mem2reg -jump-threading -loop-unswitch -ipsccp -loop-extract-single -
functionattrs small.bc -o small-opt1.bc

By using lli, the program will never terminate.
$timeout -s 9 120 lli ./small-opt1.bc
Killed

However, when using clang to produce the executable file, the output is empty.
$clang small-opt1.bc -o small3.out && timeout -s 9 120 ./small3.out
Nothing here.

IR before optimzation
--------------------------------------------------------
@a = common dso_local global i32 0, align 4
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
@b = common dso_local global i32 0, align 4

; Function Attrs: nounwind uwtable
define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  br label %4

; <label>:4:                                      ; preds = %14, %0
  br label %5

; <label>:5:                                      ; preds = %18, %4
  %6 = load i32, i32* @a, align 4, !tbaa !2
  %7 = icmp sle i32 %6, 1
  br i1 %7, label %8, label %21

; <label>:8:                                      ; preds = %5
  %9 = bitcast i32* %2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %9) #3
  %10 = load i32, i32* %2, align 4, !tbaa !2
  %11 = icmp ne i32 %10, 0
  br i1 %11, label %12, label %13

; <label>:12:                                     ; preds = %8
  store i32 2, i32* %3, align 4
  br label %14

; <label>:13:                                     ; preds = %8
  store i32 0, i32* %3, align 4
  br label %14

; <label>:14:                                     ; preds = %13, %12
  %15 = bitcast i32* %2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %15) #3
  %16 = load i32, i32* %3, align 4
  switch i32 %16, label %25 [
    i32 0, label %17
    i32 2, label %4
  ]

; <label>:17:                                     ; preds = %14
  br label %18

; <label>:18:                                     ; preds = %17
  %19 = load i32, i32* @a, align 4, !tbaa !2
  %20 = add nsw i32 %19, 1
  store i32 %20, i32* @a, align 4, !tbaa !2
  br label %5

; <label>:21:                                     ; preds = %5
  %22 = load i32, i32* @a, align 4, !tbaa !2
  %23 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %22)
  %24 = load i32, i32* %1, align 4
  ret i32 %24

; <label>:25:                                     ; preds = %14
  unreachable
}
---------------------------------------------------------------

IR after optimization
---------------------------------------------------------------
@a = common dso_local global i32 0, align 4
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
@b = common dso_local global i32 0, align 4

; Function Attrs: nounwind uwtable
define dso_local i32 @main() #0 {
  %.pr.pr = load i32, i32* @a, align 4, !tbaa !2
  %1 = icmp sle i32 %.pr.pr, 1
  br i1 %1, label %..split_crit_edge, label %..split1_crit_edge

..split1_crit_edge:                               ; preds = %0
  br label %.split1

..split_crit_edge:                                ; preds = %0
  br label %.split

.split:                                           ; preds = %..split_crit_edge
  br label %codeRepl

codeRepl:                                         ; preds = %.split
  call void @main.extracted()
  ret i32 0

.split1:                                          ; preds = %..split1_crit_edge
  %2 = load i32, i32* @a, align 4, !tbaa !2
  %3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %2)
  ret i32 0
}

; Function Attrs: norecurse noreturn nounwind readnone uwtable
define internal void @main.extracted() #3 {
newFuncRoot:
  br label %0

; <label>:0:                                      ; preds = %newFuncRoot, %1
  br label %1

; <label>:1:                                      ; preds = %0
  br label %0
}

Obviously, there is an infinite loop after optimization, lli chooses the
execute path with this infinite loop, but clang does not choose this path.
Quuxplusone commented 5 years ago

Attached small.bc (3200 bytes, application/octet-stream): .bc file of the source code

Quuxplusone commented 5 years ago

Attached small-opt1.bc (3872 bytes, application/octet-stream): small-opt1.bc

Quuxplusone commented 5 years ago
int d;
    if (d)
      goto c;

Isn't this UB due to reading uninitialized local variable? I.e., the testcase
is bad?
Quuxplusone commented 5 years ago
(In reply to Mikael Holmén from comment #2)
>     int d;
>     if (d)
>       goto c;
>
> Isn't this UB due to reading uninitialized local variable? I.e., the
> testcase is bad?

Thank you for pointing out this problem! I test it again with "int d=0", this
bug has gone. I do not realize this undefined behavior of c program. Thank you!
Quuxplusone commented 5 years ago
(In reply to Mikael Holmén from comment #2)
>     int d;
>     if (d)
>       goto c;
>
> Isn't this UB due to reading uninitialized local variable? I.e., the
> testcase is bad?

It's strange that this bug is reproduced with "int d=1".
In addition, for "clang -O3", this program also can output the right result.
So, if this bug is introduced by UB, for "int d=0" or "int d=1", it should be
fixed.
Quuxplusone commented 4 years ago

In case d=0, a will be incremented until a<=1 is false, which means that a=2 is sufficient.

In case d=1, the for loop will never reach the evaluation of a++ which comes right after the the goto. This means we will have an infinite loop.

In case d is not initialized, it may be treated as zero, for which we will get the first case, and 2 will be printed. And it may also have a value other than zero, for which we will get the second case - an infinite loop.

This test-case seems to work correctly according to spec.

I think this bug may be closed, as non-issue.