Wrong debug info generated at -O3 (-O0 is correct)

Quuxplusone commented 4 years ago


Bugzilla Link	PR45153
Status	CONFIRMED
Importance	P enhancement
Reported by	Shuo Ding (shuo.d@outlook.com)
Reported on	2020-03-09 08:36:32 -0700
Last modified on	2020-03-17 18:49:22 -0700
Version	trunk
Hardware	PC Linux
CC	aprantl@apple.com, dblaikie@gmail.com, ditaliano@apple.com, jdevlieghere@apple.com, jeremy.morse.llvm@gmail.com, josh@joshmatthews.net, keith.walker@arm.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com, rnk@google.com
Fixed by commit(s)
Attachments
Blocks	PR38768
Blocked by
See also

The expected output from lldb should be 20, but lldb outputs 0 when the program
is compiled with "-O3". It behaves correctly when the program is compiled with
"-O0".

In both cases, the debugger stops at line 7 and the tiny underscore is between
c and 0. I think this indicates that the debugger stops at the same breakpoint,
and in this case it should at least prints the same value. My understanding is
that the assignment operation of l_16 hasn't been executed, thus the correct
output should be 20.

$ cat abc.c
int a, b;
unsigned c;
int main() {
  int l_16 = 0;
  for (; l_16 < 20; l_16 = l_16 + 5)
    ;
  l_16 = c = 0 // optimize_me_not0
             ^ a;
  b = b & 1048575 ^ a;
}

$ cat cmds
b 7
r
p l_16
kill
q

$ clang-trunk -v
clang version 11.0.0 (https://github.com/llvm/llvm-project.git
bdcbdb484829c518511eece0809cc8ce1baa73c8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

$ lldb-trunk -v
lldb version 11.0.0
  clang revision bdcbdb484829c518511eece0809cc8ce1baa73c8
  llvm revision bdcbdb484829c518511eece0809cc8ce1baa73c8

$ clang-trunk -g abc.c -O3
$ lldb-trunk -s cmds -b a.out
(lldb) target create "a.out"
Current executable set to '/home/sding/LLDB-testing/reduce/24200/report/a.out'
(x86_64).
(lldb) command source -s 0 'cmds'
Executing commands in '/home/sding/LLDB-testing/reduce/24200/report/cmds'.
(lldb) b 7
Breakpoint 1: where = a.out`main + 17 at abc.c:7:12, address =
0x0000000000400491
(lldb) r
Process 5369 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400491 a.out`main at abc.c:7:12
   4      int l_16 = 0;
   5      for (; l_16 < 20; l_16 = l_16 + 5)
   6        ;
-> 7      l_16 = c = 0 // optimize_me_not0
   8                 ^ a;
   9      b = b & 1048575 ^ a;
   10   }

Process 5369 launched: '/home/sding/LLDB-testing/reduce/24200/report/a.out'
(x86_64)
(lldb) p l_16
(int) $0 = 0
(lldb) kill
Process 5369 exited with status = 9 (0x00000009)
(lldb) q

$ clang-trunk -g abc.c -O0
$ lldb-trunk -s cmds -b a.out
(lldb) target create "a.out"
Current executable set to '/home/sding/LLDB-testing/reduce/24200/report/a.out'
(x86_64).
(lldb) command source -s 0 'cmds'
Executing commands in '/home/sding/LLDB-testing/reduce/24200/report/cmds'.
(lldb) b 7
Breakpoint 1: where = a.out`main + 57 at abc.c:7:12, address =
0x00000000004004b9
(lldb) r
Process 6210 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00000000004004b9 a.out`main at abc.c:7:12
   4      int l_16 = 0;
   5      for (; l_16 < 20; l_16 = l_16 + 5)
   6        ;
-> 7      l_16 = c = 0 // optimize_me_not0
   8                 ^ a;
   9      b = b & 1048575 ^ a;
   10   }

Process 6210 launched: '/home/sding/LLDB-testing/reduce/24200/report/a.out'
(x86_64)
(lldb) p l_16
(int) $0 = 20
(lldb) kill
Process 6210 exited with status = 9 (0x00000009)
(lldb) q

Quuxplusone commented 4 years ago

At -O3 the do-nothing for loop has been eliminated; you will be able to determine this because in the optimized program, lldb will be unable to stop on line 5. Because the loop is gone, the value of l_16 is simply 0 throughout the function.

LLDB's behavior is correct here. Resolving as invalid.

Quuxplusone commented 4 years ago

We can have a philosophical discussion on whether we should print 0 or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked as optimized out. Jeremy or Adrian might have ideas/opinion.

Quuxplusone commented 4 years ago

(In reply to Davide Italiano from comment #2)

We can have a philosophical discussion on whether we should print 0 or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked as optimized out. Jeremy or Adrian might have ideas/opinion.

Yeah - seems like an LLVM debug info quality bug.

One could imagine similar code:

x = 5; x = f1(); // pure call that can be discarded because the result is unused x = 10;

If you break on line 3 before the assignment/store is executed, it'd be especially misleading to report x as having the value 5 (might lead one to make all sorts of conclusions about what f1 did, if it returned 5). It should be reported as unknown. That would mean probably never observing the value 5 (because as soon as that store is executed it's "out of range" as we have notionally stepped past the f1() call immediately), which I think is OK/better than the alternative.

Quuxplusone commented 4 years ago

Hey, what'd you know, we get my example wrong too - in a maybe slightly
different way:

__attribute__((optnone)) __attribute__((pure)) int f1() {
  return 3;
}
__attribute__((optnone)) void f2(int* x) {
}
int main() {
  int i = 3;
  i = f1();
  i = 7;
  f2(&i);
}

Ltmp3:
        #DEBUG_VALUE: main:i <- 7
        .loc    1 8 5 prologue_end      # loc.c:8:5
        movl    $7, 4(%rsp)
.Ltmp4:
        #DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
        .loc    1 0 5 is_stmt 0         # loc.c:0:5
        leaq    4(%rsp), %rdi
        .loc    1 9 3 is_stmt 1         # loc.c:9:3
        callq   .Lf2$local

For some reason we emit a constant 7 /before/ the store, which is
unfortunate/incorrect. (though maybe useful in other cases where the store is
sunk down to closer to its use? I guess that's what it's for & it's degenerate
in this case where there are no instructions to sink it past)

Quuxplusone commented 4 years ago

(In reply to David Blaikie from comment #4)
> Hey, what'd you know, we get my example wrong too - in a maybe slightly
> different way:
>
> __attribute__((optnone)) __attribute__((pure)) int f1() {
>   return 3;
> }
> __attribute__((optnone)) void f2(int* x) {
> }
> int main() {
>   int i = 3;
>   i = f1();
>   i = 7;
>   f2(&i);
> }
>
> Ltmp3:
>         #DEBUG_VALUE: main:i <- 7
>         .loc    1 8 5 prologue_end      # loc.c:8:5
>         movl    $7, 4(%rsp)
> .Ltmp4:
>         #DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
>         .loc    1 0 5 is_stmt 0         # loc.c:0:5
>         leaq    4(%rsp), %rdi
>         .loc    1 9 3 is_stmt 1         # loc.c:9:3
>         callq   .Lf2$local
>
> For some reason we emit a constant 7 /before/ the store, which is
> unfortunate/incorrect. (though maybe useful in other cases where the store
> is sunk down to closer to its use? I guess that's what it's for & it's
> degenerate in this case where there are no instructions to sink it past)

Sort of correct (apologies if I'm derailing this bug - let me know, can write
this stuff somewhere else):

Modify the code to:

i = 7;
f2(0);
f2(&i);

And it turns out the store is not sunk past the first call to f2, but the
constant location is used right up until the setup for the second call to f2:

.Ltmp3:
        #DEBUG_VALUE: main:i <- 7
        .loc    1 8 5 prologue_end      # loc.c:8:5
        movl    $7, 4(%rsp)
        .loc    1 9 3                   # loc.c:9:3
        xorl    %edi, %edi
        callq   .Lf2$local
.Ltmp4:
        #DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
        .loc    1 0 3 is_stmt 0         # loc.c:0:3
        leaq    4(%rsp), %rdi
        .loc    1 10 3 is_stmt 1        # loc.c:10:3
        callq   .Lf2$local

So, yeah, not sure why it's prematurely using a constant location, instead of
actually using the register location for the whole duration there.

Quuxplusone commented 4 years ago

There are a few awkward behaviours with this test -- firstly, my understanding
of C is that for the statement "l_16 = c = 0 ^ a;" there's no principle or rule
about which assignment happens first, only that they both complete before the
sequence point ';'. Ideally, LLVM would connect the assignments with the line
numbers so that we could make an effort to present both assignments as
completing in one statement; however there just isn't the infrastructure for
that today. See also bug 43970, where these two things don't connect.

On the topic of what range l_16 should have, I agree with Davide / David that
it's better to mark it optimised out. IMHO, we should try and avoid presenting
states of the program that are not present at -O0, and mark things as optimised
out if we would end up presenting a stale value.

Excitingly, that _is_ what's happening with variable locations, l_16 has no
location for the first few instructions. However, machine-scheduler then moves
an instruction from before the assignment to after it, introducing a backwards
step into the program. I get the following assembly:

    <+0>:     mov    0x200bae(%rip),%eax        # 0x601034 <a>   L3
    <+6>:     mov    $0xfffff,%ecx                               L8
    <+11>:    and    0x200b9b(%rip),%ecx        # 0x60102c <b>   L9
    <+17>:    mov    %eax,0x200b99(%rip)        # 0x601030 <c>   L7
    <+23>:    xor    %eax,%ecx                                   L9
    <+25>:    mov    %ecx,0x200b8d(%rip)        # 0x60102c <b>   L9
    <+31>:    xor    %eax,%eax                                   L10
    <+33>:    retq                                               L10

Where I've put 'L\d+' indicating the line number on the end. The store to 'c'
gets moved below the 'and' instruction by machine-scheduler, placing a pre-
assignment instruction below the position that the variable location
information thinks the assignment occurs. As I pointed out in the recent llvm-
dev RFC, there's no real attempt in the instruction scheduling passes to
prevent this kind of thing happening, and it's really difficult to get right
currently. bug 43955 and bug 43949 present similar situations, although they
move DBG_VALUEs around, instead of line numbers.

Exactly what we _could_ do in this scenario is arguable: we could extend the
optimised-out-ness of l_16 even further; we could also delete the line numbers
of the 'and' instruction and mov-to-ecx instruction, making line 7 appear early
in the function and not have completed its assignment yet. I suspect there's no
always-right way of doing this.

Quuxplusone commented 4 years ago

For your last comment about deleting lines entries; I would have thought the better thing to do would be to use the DWARF is_stmt flag on the line table to indicate where (and where not) to place source level breakpoints, so that the instructions remain associated with the appropriate source lines, but source level stepping only stops at the places where you want.

Quuxplusone commented 4 years ago

David wrote:
> Hey, what'd you know, we get my example wrong too - in a maybe slightly
> different way:

I think this example suffers from bug 34136, I get the following immediately
after SROA and before early-cse:

define dso_local i32 @main() #3 !dbg !24 {
entry:
  %i = alloca i32, align 4
  %0 = bitcast i32* %i to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #5, !dbg !27
  call void @llvm.dbg.declare(metadata i32* %i, metadata !26,
                              metadata !DIExpression()), !dbg !28
  store i32 3, i32* %i, align 4, !dbg !28, !tbaa !29
  %call = call i32 @f1() #6, !dbg !31
  store i32 %call, i32* %i, align 4, !dbg !32, !tbaa !29
  store i32 7, i32* %i, align 4, !dbg !33, !tbaa !29
  call void @f2(i32* %i), !dbg !34
  %1 = bitcast i32* %i to i8*, !dbg !35
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) #5, !dbg !35
  ret i32 0, !dbg !35
}

The store of %call to %i gets eliminated as a dead store; and as described in
bug 34136, there's no way of describing an assignment that no longer exists in
our current model, at this stage of compilation. This causes the earlier
variable locations to "leak" further down.

> For some reason we emit a constant 7 /before/ the store, which is
> unfortunate/incorrect. (though maybe useful in other cases where the store
> is sunk down to closer to its use? I guess that's what it's for & it's
> degenerate in this case where there are no instructions to sink it past)

For the record, that variable location is above the store from the moment it's
created. Immediately after instcombine converts the dbg.declare above into
dbg.values, we get:

define dso_local i32 @main() local_unnamed_addr #3 !dbg !24 {
entry:
  %i = alloca i32, align 4
  %0 = bitcast i32* %i to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !27
  call void @llvm.dbg.value(metadata i32 3, metadata !26, metadata !DIExpression()), !dbg !28
  call void @llvm.dbg.value(metadata i32 7, metadata !26, metadata !DIExpression()), !dbg !28
  store i32 7, i32* %i, align 4, !dbg !29, !tbaa !30
  call void @llvm.dbg.value(metadata i32* %i, metadata !26, metadata !DIExpression(DW_OP_deref)), !dbg !28
  call void @f2(i32* nonnull %i), !dbg !32
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !33
  ret i32 0, !dbg !33
}

Perhaps this is an edge case where instcombine expects a load / store to later
be eliminated? It appears to be a deliberate choice of
ConvertDebugDeclareToDebugValue for stores, to place the dbg.value ahead of the
store. I imagine this does have the potential to mislead, if someone steps onto
the store and expects the variable to contain the value from before the store.

> So, yeah, not sure why it's prematurely using a constant location,
> instead of actually using the register location for the whole duration
> there.

I think the way LowerDbgDeclare works is to describe the constant/register
locations as much as possible, on the assumption that stores are likely to be
eliminated. It deliberately describes the stack location ahead of the function
call because the call is likely to write to the stack location. I don't think
this is guaranteed to always be 100% correct, probably due to bug 34136.

Quuxplusone commented 4 years ago

Keith wrote:
> For your last comment about deleting lines entries;   I would have thought
> the better thing to do would be to use the DWARF is_stmt flag on the line
> table to indicate where (and where not) to place source level breakpoints,
> so that the instructions remain associated with the appropriate source
> lines, but source level stepping only stops at the places where you want.

True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to
communicate anything interesting to the user, which we can improve. Reid
expressed an interest in fixing this at the conference.

This reproducer would be a great test case for is_stmt, because the source
level statement becomes two instructions that then get rescheduled, which I
imagine is challenging to deal with. Right now the step onto line 7 is marked
"is_stmt" already, I guess a better implementation would be able to determine
that it _shouldn't_ be an is_stmt line?

Quuxplusone commented 4 years ago

> True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to
> communicate anything interesting to the user, which we can improve.

LLVM doesn't think about is_stmt until it's time to emit .loc directives to
the assembler.  The heuristic was designed for -O0 and doesn't really work
well in the presence of optimizations.  Some sort of IR-level tracking would
be distinctly better, but it would also be a bit disruptive.  Maybe keeping
an is_stmt flag in DebugLoc would be the least so, but I haven't thought
about it a whole lot.

Quuxplusone / LLVMBugzillaTest

Wrong debug info generated at -O3 (-O0 is correct) #44123