Open Quuxplusone opened 4 years ago
Bugzilla Link | PR45153 |
Status | CONFIRMED |
Importance | P enhancement |
Reported by | Shuo Ding (shuo.d@outlook.com) |
Reported on | 2020-03-09 08:36:32 -0700 |
Last modified on | 2020-03-17 18:49:22 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | aprantl@apple.com, dblaikie@gmail.com, ditaliano@apple.com, jdevlieghere@apple.com, jeremy.morse.llvm@gmail.com, josh@joshmatthews.net, keith.walker@arm.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com, rnk@google.com |
Fixed by commit(s) | |
Attachments | |
Blocks | PR38768 |
Blocked by | |
See also |
At -O3 the do-nothing for
loop has been eliminated; you will be able to
determine this because in the optimized program, lldb will be unable to stop
on line 5. Because the loop is gone, the value of l_16
is simply 0
throughout the function.
LLDB's behavior is correct here. Resolving as invalid.
We can have a philosophical discussion on whether we should print 0
or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked as optimized out
. Jeremy or Adrian might have ideas/opinion.
(In reply to Davide Italiano from comment #2)
We can have a philosophical discussion on whether we should print
0
or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked asoptimized out
. Jeremy or Adrian might have ideas/opinion.
Yeah - seems like an LLVM debug info quality bug.
One could imagine similar code:
x = 5; x = f1(); // pure call that can be discarded because the result is unused x = 10;
If you break on line 3 before the assignment/store is executed, it'd be especially misleading to report x as having the value 5 (might lead one to make all sorts of conclusions about what f1 did, if it returned 5). It should be reported as unknown. That would mean probably never observing the value 5 (because as soon as that store is executed it's "out of range" as we have notionally stepped past the f1() call immediately), which I think is OK/better than the alternative.
Hey, what'd you know, we get my example wrong too - in a maybe slightly
different way:
__attribute__((optnone)) __attribute__((pure)) int f1() {
return 3;
}
__attribute__((optnone)) void f2(int* x) {
}
int main() {
int i = 3;
i = f1();
i = 7;
f2(&i);
}
Ltmp3:
#DEBUG_VALUE: main:i <- 7
.loc 1 8 5 prologue_end # loc.c:8:5
movl $7, 4(%rsp)
.Ltmp4:
#DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
.loc 1 0 5 is_stmt 0 # loc.c:0:5
leaq 4(%rsp), %rdi
.loc 1 9 3 is_stmt 1 # loc.c:9:3
callq .Lf2$local
For some reason we emit a constant 7 /before/ the store, which is
unfortunate/incorrect. (though maybe useful in other cases where the store is
sunk down to closer to its use? I guess that's what it's for & it's degenerate
in this case where there are no instructions to sink it past)
(In reply to David Blaikie from comment #4)
> Hey, what'd you know, we get my example wrong too - in a maybe slightly
> different way:
>
> __attribute__((optnone)) __attribute__((pure)) int f1() {
> return 3;
> }
> __attribute__((optnone)) void f2(int* x) {
> }
> int main() {
> int i = 3;
> i = f1();
> i = 7;
> f2(&i);
> }
>
> Ltmp3:
> #DEBUG_VALUE: main:i <- 7
> .loc 1 8 5 prologue_end # loc.c:8:5
> movl $7, 4(%rsp)
> .Ltmp4:
> #DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
> .loc 1 0 5 is_stmt 0 # loc.c:0:5
> leaq 4(%rsp), %rdi
> .loc 1 9 3 is_stmt 1 # loc.c:9:3
> callq .Lf2$local
>
> For some reason we emit a constant 7 /before/ the store, which is
> unfortunate/incorrect. (though maybe useful in other cases where the store
> is sunk down to closer to its use? I guess that's what it's for & it's
> degenerate in this case where there are no instructions to sink it past)
Sort of correct (apologies if I'm derailing this bug - let me know, can write
this stuff somewhere else):
Modify the code to:
i = 7;
f2(0);
f2(&i);
And it turns out the store is not sunk past the first call to f2, but the
constant location is used right up until the setup for the second call to f2:
.Ltmp3:
#DEBUG_VALUE: main:i <- 7
.loc 1 8 5 prologue_end # loc.c:8:5
movl $7, 4(%rsp)
.loc 1 9 3 # loc.c:9:3
xorl %edi, %edi
callq .Lf2$local
.Ltmp4:
#DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp
.loc 1 0 3 is_stmt 0 # loc.c:0:3
leaq 4(%rsp), %rdi
.loc 1 10 3 is_stmt 1 # loc.c:10:3
callq .Lf2$local
So, yeah, not sure why it's prematurely using a constant location, instead of
actually using the register location for the whole duration there.
There are a few awkward behaviours with this test -- firstly, my understanding
of C is that for the statement "l_16 = c = 0 ^ a;" there's no principle or rule
about which assignment happens first, only that they both complete before the
sequence point ';'. Ideally, LLVM would connect the assignments with the line
numbers so that we could make an effort to present both assignments as
completing in one statement; however there just isn't the infrastructure for
that today. See also bug 43970, where these two things don't connect.
On the topic of what range l_16 should have, I agree with Davide / David that
it's better to mark it optimised out. IMHO, we should try and avoid presenting
states of the program that are not present at -O0, and mark things as optimised
out if we would end up presenting a stale value.
Excitingly, that _is_ what's happening with variable locations, l_16 has no
location for the first few instructions. However, machine-scheduler then moves
an instruction from before the assignment to after it, introducing a backwards
step into the program. I get the following assembly:
<+0>: mov 0x200bae(%rip),%eax # 0x601034 <a> L3
<+6>: mov $0xfffff,%ecx L8
<+11>: and 0x200b9b(%rip),%ecx # 0x60102c <b> L9
<+17>: mov %eax,0x200b99(%rip) # 0x601030 <c> L7
<+23>: xor %eax,%ecx L9
<+25>: mov %ecx,0x200b8d(%rip) # 0x60102c <b> L9
<+31>: xor %eax,%eax L10
<+33>: retq L10
Where I've put 'L\d+' indicating the line number on the end. The store to 'c'
gets moved below the 'and' instruction by machine-scheduler, placing a pre-
assignment instruction below the position that the variable location
information thinks the assignment occurs. As I pointed out in the recent llvm-
dev RFC, there's no real attempt in the instruction scheduling passes to
prevent this kind of thing happening, and it's really difficult to get right
currently. bug 43955 and bug 43949 present similar situations, although they
move DBG_VALUEs around, instead of line numbers.
Exactly what we _could_ do in this scenario is arguable: we could extend the
optimised-out-ness of l_16 even further; we could also delete the line numbers
of the 'and' instruction and mov-to-ecx instruction, making line 7 appear early
in the function and not have completed its assignment yet. I suspect there's no
always-right way of doing this.
For your last comment about deleting lines entries; I would have thought the better thing to do would be to use the DWARF is_stmt flag on the line table to indicate where (and where not) to place source level breakpoints, so that the instructions remain associated with the appropriate source lines, but source level stepping only stops at the places where you want.
David wrote:
> Hey, what'd you know, we get my example wrong too - in a maybe slightly
> different way:
I think this example suffers from bug 34136, I get the following immediately
after SROA and before early-cse:
define dso_local i32 @main() #3 !dbg !24 {
entry:
%i = alloca i32, align 4
%0 = bitcast i32* %i to i8*, !dbg !27
call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #5, !dbg !27
call void @llvm.dbg.declare(metadata i32* %i, metadata !26,
metadata !DIExpression()), !dbg !28
store i32 3, i32* %i, align 4, !dbg !28, !tbaa !29
%call = call i32 @f1() #6, !dbg !31
store i32 %call, i32* %i, align 4, !dbg !32, !tbaa !29
store i32 7, i32* %i, align 4, !dbg !33, !tbaa !29
call void @f2(i32* %i), !dbg !34
%1 = bitcast i32* %i to i8*, !dbg !35
call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) #5, !dbg !35
ret i32 0, !dbg !35
}
The store of %call to %i gets eliminated as a dead store; and as described in
bug 34136, there's no way of describing an assignment that no longer exists in
our current model, at this stage of compilation. This causes the earlier
variable locations to "leak" further down.
> For some reason we emit a constant 7 /before/ the store, which is
> unfortunate/incorrect. (though maybe useful in other cases where the store
> is sunk down to closer to its use? I guess that's what it's for & it's
> degenerate in this case where there are no instructions to sink it past)
For the record, that variable location is above the store from the moment it's
created. Immediately after instcombine converts the dbg.declare above into
dbg.values, we get:
define dso_local i32 @main() local_unnamed_addr #3 !dbg !24 {
entry:
%i = alloca i32, align 4
%0 = bitcast i32* %i to i8*, !dbg !27
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #5, !dbg !27
call void @llvm.dbg.value(metadata i32 3, metadata !26, metadata !DIExpression()), !dbg !28
call void @llvm.dbg.value(metadata i32 7, metadata !26, metadata !DIExpression()), !dbg !28
store i32 7, i32* %i, align 4, !dbg !29, !tbaa !30
call void @llvm.dbg.value(metadata i32* %i, metadata !26, metadata !DIExpression(DW_OP_deref)), !dbg !28
call void @f2(i32* nonnull %i), !dbg !32
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5, !dbg !33
ret i32 0, !dbg !33
}
Perhaps this is an edge case where instcombine expects a load / store to later
be eliminated? It appears to be a deliberate choice of
ConvertDebugDeclareToDebugValue for stores, to place the dbg.value ahead of the
store. I imagine this does have the potential to mislead, if someone steps onto
the store and expects the variable to contain the value from before the store.
> So, yeah, not sure why it's prematurely using a constant location,
> instead of actually using the register location for the whole duration
> there.
I think the way LowerDbgDeclare works is to describe the constant/register
locations as much as possible, on the assumption that stores are likely to be
eliminated. It deliberately describes the stack location ahead of the function
call because the call is likely to write to the stack location. I don't think
this is guaranteed to always be 100% correct, probably due to bug 34136.
Keith wrote:
> For your last comment about deleting lines entries; I would have thought
> the better thing to do would be to use the DWARF is_stmt flag on the line
> table to indicate where (and where not) to place source level breakpoints,
> so that the instructions remain associated with the appropriate source
> lines, but source level stepping only stops at the places where you want.
True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to
communicate anything interesting to the user, which we can improve. Reid
expressed an interest in fixing this at the conference.
This reproducer would be a great test case for is_stmt, because the source
level statement becomes two instructions that then get rescheduled, which I
imagine is challenging to deal with. Right now the step onto line 7 is marked
"is_stmt" already, I guess a better implementation would be able to determine
that it _shouldn't_ be an is_stmt line?
> True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to
> communicate anything interesting to the user, which we can improve.
LLVM doesn't think about is_stmt until it's time to emit .loc directives to
the assembler. The heuristic was designed for -O0 and doesn't really work
well in the presence of optimizations. Some sort of IR-level tracking would
be distinctly better, but it would also be a bit disruptive. Maybe keeping
an is_stmt flag in DebugLoc would be the least so, but I haven't thought
about it a whole lot.