Improve optimized debug info for escaped variables that live in memory

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Improve optimized debug info for escaped variables that live in memory #33108

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago


Bugzilla Link	PR34136
Status	NEW
Importance	P enhancement
Reported by	Reid Kleckner (rnk@google.com)
Reported on	2017-08-09 15:17:56 -0700
Last modified on	2021-01-11 10:28:43 -0800
Version	trunk
Hardware	PC Windows NT
CC	aprantl@apple.com, chandlerc@gmail.com, chris.jackson@sony.com, dblaikie@gmail.com, ditaliano@apple.com, dxf@google.com, echristo@gmail.com, hans@chromium.org, hfinkel@anl.gov, jeremy.morse.llvm@gmail.com, jingham@apple.com, jmuizelaar@mozilla.com, josh@joshmatthews.net, lhames@gmail.com, llvm-bugs@lists.llvm.org, llvm-bugzilla-mail@molenda.com, nicolasweber@gmx.de, orlando.hyams@sony.com, paul_robinson@playstation.sony.com, siggi@google.com, zturner@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

Consider:

int getint(void);
void alias(char *);
void crash(void);
void f() {
  int x = getint();
  int y = getint();
  int z = getint();
  alias((char*)&x);
  alias((char*)&y);
  alias((char*)&z);
  crash();
}

InstCombine runs LowerDbgDeclares on this code, and we end up with very poor
debug info. We basically say that x, y, and z live in EAX after each call to
getint, and they are immediately clobbered, never to be available again.

Those casts to char* also prevent the transform from adding more dbg.values
that would extend the range.

This was reduced from http://crbug.com/753934

Quuxplusone commented 7 years ago

IR before instcombine:

; Function Attrs: nounwind uwtable
define void @f() #0 !dbg !11 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  %0 = bitcast i32* %x to i8*, !dbg !19
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #4, !dbg !19
  call void @llvm.dbg.declare(metadata i32* %x, metadata !15, metadata !20), !dbg !21
  %call = call i32 @getint(), !dbg !22
  store i32 %call, i32* %x, align 4, !dbg !21, !tbaa !23
  %1 = bitcast i32* %y to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) #4, !dbg !27
  call void @llvm.dbg.declare(metadata i32* %y, metadata !17, metadata !20), !dbg !28
  %call1 = call i32 @getint(), !dbg !29
  store i32 %call1, i32* %y, align 4, !dbg !28, !tbaa !23
  %2 = bitcast i32* %z to i8*, !dbg !30
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %2) #4, !dbg !30
  call void @llvm.dbg.declare(metadata i32* %z, metadata !18, metadata !20), !dbg !31
  %call2 = call i32 @getint(), !dbg !32
  store i32 %call2, i32* %z, align 4, !dbg !31, !tbaa !23
  %3 = bitcast i32* %x to i8*, !dbg !33
  call void @alias(i8* %3), !dbg !34
  %4 = bitcast i32* %y to i8*, !dbg !35
  call void @alias(i8* %4), !dbg !36
  %5 = bitcast i32* %z to i8*, !dbg !37
  call void @alias(i8* %5), !dbg !38
  call void @crash(), !dbg !39
  %6 = bitcast i32* %z to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %6) #4, !dbg !40
  %7 = bitcast i32* %y to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %7) #4, !dbg !40
  %8 = bitcast i32* %x to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %8) #4, !dbg !40
  ret void, !dbg !40
}

IR after instcombine:

define void @f() #0 !dbg !11 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  %0 = bitcast i32* %x to i8*, !dbg !19
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #4, !dbg !19
  %call = call i32 @getint() #4, !dbg !20
  call void @llvm.dbg.value(metadata i32 %call, metadata !15, metadata !21), !dbg !22
  store i32 %call, i32* %x, align 4, !dbg !22, !tbaa !23
  %1 = bitcast i32* %y to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) #4, !dbg !27
  %call1 = call i32 @getint() #4, !dbg !28
  call void @llvm.dbg.value(metadata i32 %call1, metadata !17, metadata !21), !dbg !29
  store i32 %call1, i32* %y, align 4, !dbg !29, !tbaa !23
  %2 = bitcast i32* %z to i8*, !dbg !30
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %2) #4, !dbg !30
  %call2 = call i32 @getint() #4, !dbg !31
  call void @llvm.dbg.value(metadata i32 %call2, metadata !18, metadata !21), !dbg !32
  store i32 %call2, i32* %z, align 4, !dbg !32, !tbaa !23
  %3 = bitcast i32* %x to i8*, !dbg !33
  call void @alias(i8* %3) #4, !dbg !34
  %4 = bitcast i32* %y to i8*, !dbg !35
  call void @alias(i8* %4) #4, !dbg !36
  %5 = bitcast i32* %z to i8*, !dbg !37
  call void @alias(i8* %5) #4, !dbg !38
  call void @crash() #4, !dbg !39
  %6 = bitcast i32* %z to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %6) #4, !dbg !40
  %7 = bitcast i32* %y to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %7) #4, !dbg !40
  %8 = bitcast i32* %x to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %8) #4, !dbg !40
  ret void, !dbg !40
}

Note that we only have 3 dbg.value intrinsics. These ultimately turn into
DEBUG_VALUE EAX MIR instructions:

        #DEBUG_VALUE: f:x <- %EAX
        .cv_loc 0 1 5 7                 # store.c:5:7
        movl    %eax, 52(%rsp)
        .cv_loc 0 1 6 11                # store.c:6:11
        callq   getint
.Ltmp1:
        #DEBUG_VALUE: f:y <- %EAX
        .cv_loc 0 1 6 7                 # store.c:6:7
        movl    %eax, 48(%rsp)
        .cv_loc 0 1 7 11                # store.c:7:11
        callq   getint
.Ltmp2:
        #DEBUG_VALUE: f:z <- %EAX
        .cv_loc 0 1 7 7                 # store.c:7:7
        movl    %eax, 44(%rsp)
        leaq    52(%rsp), %rcx
        .cv_loc 0 1 8 3                 # store.c:8:3
        callq   alias

They are all immediately clobbered, so that's no good.

Even if we drop dbg.declare, we should have something to indicate that the
variable will live in memory. The dbg.value could use a DW_OP_deref
DIExpression after the store, for example.

Quuxplusone commented 7 years ago

LowerDbgDeclare gives up on any use that isn't a direct use of the entire
alloca. Aggregates are typically built up one field at a time, so we can often
lose all debug info for them. Consider this example:

struct Foo { int x, y; };
int getint(void);
void escape(char *);
void f() {
  struct Foo o;
  o.x = getint();
  o.y = getint();
  escape((char *)&o);
}

I'm starting to feel more and more like LowerDbgDeclare is just a bad idea. We
should keep dbg.declare around as long as the variable's alloca is still alive.
Even if instcombine, GVN, or some other pass deletes or moves some stores to a
local variable, I'd much rather tell the user the variable is in memory with a
stale value than that we don't have any info at all.

Quuxplusone commented 7 years ago

(In reply to Reid Kleckner from comment #2)
> I'd much rather tell the user the variable is in
> memory with a stale value than that we don't have any info at all.

There's no way to communicate to the user that the value is potentially stale
though - so that seems likely to /really/ confuse a user & lead them way down
the garden path, potentially.

Quuxplusone commented 7 years ago

We probably want both.

My favorite example is this:

void g(int*);
void f(int i) {
  g(&i); // alloca is live until here
  if (i == 0) { // end of alloca live range
     ...  // i is now a constant, but without lowering to dbg.value we can't capture that and would still point to the stack slot that may even have been reused by now.
  }
}

Quuxplusone commented 7 years ago

(In reply to David Blaikie from comment #3)
> (In reply to Reid Kleckner from comment #2)
> > I'd much rather tell the user the variable is in
> > memory with a stale value than that we don't have any info at all.
>
> There's no way to communicate to the user that the value is potentially
> stale though - so that seems likely to /really/ confuse a user & lead them
> way down the garden path, potentially.

I don't think I agree with that perspective. In the tradeoff between giving the
user more information that might be potentially misleading vs. no information
at all, I'd prefer to give more information. Even if LLVM moves some stores
around and the debugger prints uninitialized garbage for my local variables, I
can step around until it gets initialized and then stare at the assembly to
figure out what happened. That's a pretty normal workflow when debugging
optimized code, IMO.

(In reply to Adrian Prantl from comment #4)
> We probably want both.
>
> My favorite example is this:
>
> void g(int*);
> void f(int i) {
>   g(&i); // alloca is live until here
>   if (i == 0) { // end of alloca live range
>      ...  // i is now a constant, but without lowering to dbg.value we can't
> capture that and would still point to the stack slot that may even have been
> reused by now.
>   }
> }

I think we could fix this by modifying stack coloring and GVN. For the changes
to stack coloring, consider this simpler example:

void g(restrict int *);
int getint(void);
void f(int i) {
  g(&i);
  int j = getint();
  g(&j);
}

If stack coloring reuses i's slot for j, maybe it could update the debug info
to shrink the live range of i. I don't think we can do this today, because our
frame index table uses the labels inserted for the scope that ultimately come
from the DILocations stapled on MachineInstrs.

GVN could also help in the original example by inserting llvm.dbg.value i == 0
calls when it does that predication. Again, we'd have to learn how to mix
dbg.declare with dbg.value, but I think we'll need to do something like that in
the long run.

Quuxplusone commented 7 years ago

> I don't think I agree with that perspective. In the tradeoff between giving
the user more information that might be potentially misleading vs. no
information at all, I'd prefer to give more information.

I strongly disagree with this statement. If what the debugger reports cannot be
trusted  in *some* cases, then everything the debugger presents to the user
becomes worthless, because the user won't be able to tell the difference
between information that is correct and information that may be correct.

Quuxplusone commented 7 years ago

(In reply to Adrian Prantl from comment #6)
> > I don't think I agree with that perspective. In the tradeoff between giving
the user more information that might be potentially misleading vs. no
information at all, I'd prefer to give more information.
>
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

Yeah - while there might be cases where the answer is unclear (due to
optimizations, code gets shuffled around, there may be uncertainty around what
the value of a variable is at this instruction) & I can understand debating
whether there's a "good enough" answer in some of those cases.

But in cases where it's objectively incorrect - the value's moved into a
register, the register is mutated to reflect the new value but the memory
location still holds the old value - I don't think it's a good idea to consider
that anything other than a bug to be fixed.

It's not so much the user getting "uninitialized garbage" they can easily
identify, ignore, & move on from. But about them getting answers that are close
enough to be plausible & then spending a lot more time trying to understand a
problem that doesn't exist because the debugger told them it did.

Quuxplusone commented 7 years ago

(In reply to Adrian Prantl from comment #6)
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

I think there are cases where we will have to accept stale variable values. I
don't think we should treat it as an absolute goal to never give incorrect
values. Otherwise we burden *every* LLVM pass that has the ability to delete
memory writes with the responsibility of updating debug info. I think 90% of
memory updates are eliminated by a few core passes: GVN and instcombine. If we
can teach them to emit dbg.value when deleting stores to static allocas with
debug info, we will have a reasonably good debug experience. There is a long
tail of passes like memcpyopt that delete stores that I don't expect to
preserve debug info. We can invent a way for them to say "I deleted this dead
memcpy, so the value of the variable is unavailable at this point, and becomes
available again here (dbg.value+undef?)", but I don't think we'll be able to
maintain that for all passes.

It's also worth considering that we delete dead stores to heap objects, so the
user can already observe impossible program states in the debugger. We can't
just put the blinders on and say "sorry we optimized the program, nothing is
certain". The user is already debugging an optimized program, they shouldn't
trust anything it says, and our existing user base seems to already know that.

Quuxplusone commented 7 years ago

(In reply to Reid Kleckner from comment #8)
> The user is already debugging an optimized program,
> they shouldn't trust anything it says, and our existing user base seems to
> already know that.

What sort of feedback from users is available on this?

& there's also a risk that the more unreliable these things get, the less users
trust the answers they get, the less the use it because it's untrustworthy.

I imagine there are fairly few cases where writes to heap values are optimized
away in a way that's especially user visible. Function call boundaries happen
often and are a pretty strong impediment to alias analysis for heap values, etc.

Quuxplusone commented 7 years ago

(In reply to Adrian Prantl from comment #6)
> > I don't think I agree with that perspective. In the tradeoff between giving
the user more information that might be potentially misleading vs. no
information at all, I'd prefer to give more information.
>
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

I strongly disagree with this statement.  I think it's an overly idealistic
view of debugging optimized code.  Nobody *wants* to debug optimized code,
because it's a pain. And the reason it's a pain is *because* the original
source code doesn't map well to the generated machine code.  At the same time,
the compiler has more information than the user does about the transformations
that happened, and for the compiler to say "I can't be 100% sure that this
information is 100% accurate, therefore I'm not even going to *try* to help
you" seems like the wrong approach.

Generating debug info is not like generating binary code, where it is either
provably correct or provably incorrect.  The entire point of debug info is to
help the user as much as possible.  By generating nothing, we are not helping
anyone, and we may even be making it impossible for a user to track down a bug.
By generating something, with the caveat that it may be stale, we at least
point the user in a direction.  If it's wrong, then they're no worse off than
when they started.  But if it's right, then we saved them a ton of time.

Anyone debugging optimized code is goign to be going back and forth between the
assembly and source view anyway, and should be able to confirm that the
debugger is reporting accurate information.

Quuxplusone commented 7 years ago

(In reply to David Blaikie from comment #9)
> (In reply to Reid Kleckner from comment #8)
> > The user is already debugging an optimized program,
> > they shouldn't trust anything it says, and our existing user base seems to
> > already know that.
>
> What sort of feedback from users is available on this?
>
> & there's also a risk that the more unreliable these things get, the less
> users trust the answers they get, the less the use it because it's
> untrustworthy.

Every build system I've ever encountered has at least two configurations: debug
and release, even if they don't go by those names. The ubiquity of those
configurations reflects the doubts that users have about the quality of
optimized debug info.

> I imagine there are fairly few cases where writes to heap values are
> optimized away in a way that's especially user visible. Function call
> boundaries happen often and are a pretty strong impediment to alias analysis
> for heap values, etc.

This logic also applies to escaped locals to some extent. Locals that cannot be
promoted to SSA are modeled as "read/write" at every external function call.
Alias analysis, of course, is a lot stronger, so things like GVN do fire in
practice.

Quuxplusone commented 7 years ago

(In reply to Zachary Turner from comment #10)
>  If it's wrong, then they're no worse off than when they started.

This is the bit where I think there's a lot of/strong disagreement.

If the debugger tells someone the wrong answer & the user doesn't know it's
wrong - it may cost them a lot of time going down the wrong debugging path,
trying to understand why that answer is happening in their program (when it
isn't).

So, yes, I do think it's more costly to give the wrong answer than no answer.
How much more costly (how many wrong answers would be worth it to get some
right answers instead of no answers)? Not sure.

I think there is a valid tradeoff there, especially as an intermediate state,
where we say "yeah, this technique gives the wrong answer sometimes but gets us
a whole lot more right answers (rather than no answer) - /and/ we have a path
forward to address the wrong answers it creates (& that they are bugs and
should be treated as such)".

Quuxplusone commented 7 years ago

(In reply to David Blaikie from comment #12)
> (In reply to Zachary Turner from comment #10)
> >  If it's wrong, then they're no worse off than when they started.
>
> This is the bit where I think there's a lot of/strong disagreement.
>
> If the debugger tells someone the wrong answer & the user doesn't know it's
> wrong - it may cost them a lot of time going down the wrong debugging path,
> trying to understand why that answer is happening in their program (when it
> isn't).
>
> So, yes, I do think it's more costly to give the wrong answer than no
> answer. How much more costly (how many wrong answers would be worth it to
> get some right answers instead of no answers)? Not sure.
>
> I think there is a valid tradeoff there, especially as an intermediate
> state, where we say "yeah, this technique gives the wrong answer sometimes
> but gets us a whole lot more right answers (rather than no answer) - /and/
> we have a path forward to address the wrong answers it creates (& that they
> are bugs and should be treated as such)".

(In reply to Reid Kleckner from comment #11)
> (In reply to David Blaikie from comment #9)
> > (In reply to Reid Kleckner from comment #8)
> > > The user is already debugging an optimized program,
> > > they shouldn't trust anything it says, and our existing user base seems to
> > > already know that.
> >
> > What sort of feedback from users is available on this?
> >
> > & there's also a risk that the more unreliable these things get, the less
> > users trust the answers they get, the less the use it because it's
> > untrustworthy.
>
> Every build system I've ever encountered has at least two configurations:
> debug and release, even if they don't go by those names. The ubiquity of
> those configurations reflects the doubts that users have about the quality
> of optimized debug info.

I think it's reasonable to expect that debugging optimized programs will be
harder - I don't think that necessarily means expecting the answers to be
wrong. Code reordering (especially now that we're dropping debug locations when
code moves between blocks) can be confusing, values may be optimized away, etc.
Yeah, it's harder & interacting with code that behaves more like the source
code is easier.

I don't think that necessarily implies users are fine with/expect the debugger
to give answers that are wrong.

> > I imagine there are fairly few cases where writes to heap values are
> > optimized away in a way that's especially user visible. Function call
> > boundaries happen often and are a pretty strong impediment to alias analysis
> > for heap values, etc.
>
> This logic also applies to escaped locals to some extent. Locals that cannot
> be promoted to SSA are modeled as "read/write" at every external function
> call. Alias analysis, of course, is a lot stronger, so things like GVN do
> fire in practice.

Right - and there it's pretty noticeable as you're saying and would be great to
improve.

I think there's substantial overlap in goals/desires here, feels like this is
turning into a philosophical/principle stand when maybe it doesn't need to be?
There's a technical tradeoff here - in both extremes ("everything's optimized
away, we don't know where anything is" and "every variable has the erroneous
value 3") it wouldn't be any good for a user. And somewhere there are practical
tradeoffs around accuracy and usability for users - and a place from which
improvements on both sides (removing erroneous values, adding info about values
that were previously unknown) would be valuable.

I'm not sure exactly how to make that tradeoff judgement, but I think it's
probably more helpful to look at it from that perspective.

Quuxplusone commented 7 years ago

(In reply to David Blaikie from comment #9)
> (In reply to Reid Kleckner from comment #8)
> > The user is already debugging an optimized program,
> > they shouldn't trust anything it says, and our existing user base seems to
> > already know that.
>
> What sort of feedback from users is available on this?

I'm a long-time user of debugging tools on Microsoft Visual C++ optimized
binaries. I can say that we who do this know not to trust what the debugger
presents, and often I find e.g. this==nullptr and other anomalies in debugging.
Note that even with perfect debugging information, the debugger will not always
be able to unwind the stack correctly. (Think 3rd party modules with no
symbols, JIT code with no unwind info, malware with bad intent, etc.).
So, no matter the fidelity of the symbols, it will always be necessary to
exercise caution when debugging Windows crashes.

Quuxplusone commented 7 years ago

(In reply to David Blaikie from comment #13)
> (In reply to David Blaikie from comment #12)
> > (In reply to Zachary Turner from comment #10)
> > >  If it's wrong, then they're no worse off than when they started.
> >
> > This is the bit where I think there's a lot of/strong disagreement.
> >
> > If the debugger tells someone the wrong answer & the user doesn't know it's
> > wrong - it may cost them a lot of time going down the wrong debugging path,
> > trying to understand why that answer is happening in their program (when it
> > isn't).
> >
> > So, yes, I do think it's more costly to give the wrong answer than no
> > answer. How much more costly (how many wrong answers would be worth it to
> > get some right answers instead of no answers)? Not sure.
> >
> > I think there is a valid tradeoff there, especially as an intermediate
> > state, where we say "yeah, this technique gives the wrong answer sometimes
> > but gets us a whole lot more right answers (rather than no answer) - /and/
> > we have a path forward to address the wrong answers it creates (& that they
> > are bugs and should be treated as such)".
>
> (In reply to Reid Kleckner from comment #11)
> > (In reply to David Blaikie from comment #9)
> > > (In reply to Reid Kleckner from comment #8)
> > > > The user is already debugging an optimized program,
> > > > they shouldn't trust anything it says, and our existing user base seems
to
> > > > already know that.
> > >
> > > What sort of feedback from users is available on this?
> > >
> > > & there's also a risk that the more unreliable these things get, the less
> > > users trust the answers they get, the less the use it because it's
> > > untrustworthy.
> >
> > Every build system I've ever encountered has at least two configurations:
> > debug and release, even if they don't go by those names. The ubiquity of
> > those configurations reflects the doubts that users have about the quality
> > of optimized debug info.
>
> I think it's reasonable to expect that debugging optimized programs will be
> harder - I don't think that necessarily means expecting the answers to be
> wrong. Code reordering (especially now that we're dropping debug locations
> when code moves between blocks) can be confusing, values may be optimized
> away, etc. Yeah, it's harder & interacting with code that behaves more like
> the source code is easier.
>
> I don't think that necessarily implies users are fine with/expect the
> debugger to give answers that are wrong.

This is consistent with my experience as well. Users get really annoyed when
the debugger provides wrong values (even more annoyed than when the debugger
provides no values).

Quuxplusone commented 7 years ago

I really disagree with the statement:

If it's wrong, then they're no worse off than when they started. But if it's right, then we saved them a ton of time.

It would be too exhausting to use a debugger if you have to keep in the forefront of your mind that every value you look at might very well be wrong. So you put that aside and trust it, and then when you're in the middle chasing down some problem and some variable has a seemingly impossible value you spend hours trying to figure out how it could possibly be that value till you remember that the debugger is a piece of crap and has been lying to you again. You end up much worse off than when you started because you've lost your train of thought on the wild goose chase the debugger led you, you get to start again realizing that your tools are untrustworthy.

This is not a theoretical opinion on my part.

One difference between gdb & lldb is that when gdb can't unwind a register, it just attributes it to the last known value in a younger frame. The end result is gdb would confidently declare a value for some variable (usually an argument in the "bt" list) which would be totally bogus. That caused many many wasted hours over the years, and even people who had been bitten before would fall into the trap of trusting these values because you just can't use the thing if you have to always distrust it.

I have been on the receiving end when people in this situation have been very forcefully conveying to me that they were much worse off for our having lied to them than if we had just told them we didn't know.

In lldb, if we can't unwind a register we report it as unknown. People fairly easily understand that not everything is recoverable in optimized code debugging. From many years supporting folks using the two approaches it is clear lldb's was the right choice.

Also, if lldb knows the places where the values are correct, we can help the user propagate those values to the places they need to stop. For instance, we can present to them the places they can stop where the value IS trustworthy. We can even do things like put hidden stops at known good locations and record info behind the user's back to represent when they later stop.. Or we can get lldb to do some assembly analysis to try to figure out where the value moved from it's last good locations, automating the work folks have to do by hand when the debug info isn't helpful.

OTOH, if all we know is "declared locations are sometimes right" you'd never know when to do this work, and anything you do will be colored by the uncertainty.

If we added a DW_AT_probable_location and only converted it to DW_AT_location when clang is confident of its analysis, that would be okay. Then you could tell people what to expect. But forcing people to live constantly under "the values of Damocles" seems an awfully bad way to construct a tools environment.

Quuxplusone commented 7 years ago

To throw two cents in, from the debugger's perspective it's always better to say "I don't know" rather than "this is the value" and give the wrong value. The former is incredibly annoying to anyone working on optimized code -- I work with several groups that have to always build optimized and debugging that code is way harder than it should be today, I fully understand where the pain is coming from. But thinking about the failure modes makes it clear what the right answer is.

If I say "I cannot give you the value of this variable", the dev will either find the value in another stack frame, or they'll resort to reading assembly, find where it is actually still live, and get it. They will curse the debugger for making their lives so hard (I'd prefer they curse the compiler, but that's not what they do, alas.)

If I hand out a value that is wrong, the developer can waste DAYS of time working on the assumption that this code somehow got this impossible value. I've gotten those bug reports -- people who, literally, wasted days because the debugger gave out a value that was not correct. It's so much worse of a failure mode, we've always considered this unacceptable when we've encountered these situations in lldb. The more you work with users of the debugger and their bug reports, the more you realize the importance of getting this right.

The debugger must be conservative in these situations, as frustrating as that is. The debug info must be conservative in these situations. Or if the compiler wants to emit variable locations that are possibly correct, add a different tag in the DWARF for it and debuggers like lldb will be free to ignore it.

Any debug info that ckauns a variable is accessible at a location, and that location does not have the variable value, is buggy debug info.

Quuxplusone commented 7 years ago

One thing where this is hurting us a lot in chrome/win land at the moment is minidumps. Minidumps aren't really a thing on non-Windows, which might be why expectations here are so different.

On Windows, when chrome (or what have you) crashes, you can download a minidump from the crash server, load that into your debugger, and then use that to inspect the state of variables. Users are aware that they look at optimized debug info. Someone told me "Misleading information is mitigated by the fact you can ask the debugger where's it's getting its information (dv /v) and if you see stuff in a register, then you know to start distrusting the values."

With clang, when users ask for the values of things while debugging minidumps, they currently almost always get "no answer" instead of "likely answer is X". This is very different from how things used to work for them, and this makes people unhappy too.

Given the different expectations, would adding this behavior behind a (default-off) flag be a short-term option? Then we can improve optimized debug info asynchronously and see if it's possible to get it to be good enough without having to sometimes show invalid information.

Quuxplusone commented 7 years ago

(In reply to Nico Weber from comment #18)
> One thing where this is hurting us a lot in chrome/win land at the moment is
> minidumps. Minidumps aren't really a thing on non-Windows, which might be
> why expectations here are so different.
>
> On Windows, when chrome (or what have you) crashes, you can download a
> minidump from the crash server, load that into your debugger, and then use
> that to inspect the state of variables. Users are aware that they look at
> optimized debug info. Someone told me "Misleading information is mitigated
> by the fact you can ask the debugger where's it's getting its information
> (dv /v) and if you see stuff in a register, then you know to start
> distrusting the values."

Interestingly it seems like that's the opposite of the likely erroneous
information that would happen in the direction discussed here.

Here the issue is that LLVM isn't describing the memory location backing a
variable once it copies the value into a register. By adding the stack memory
location the likely mistakes are that the value may be stale because a dead
store has been eliminated and the value was /only/ in that register (or nowhere
at all).

> With clang, when users ask for the values of things while debugging
> minidumps, they currently almost always get "no answer" instead of "likely
> answer is X". This is very different from how things used to work for them,
> and this makes people unhappy too.

*nod* It's clearly a regression for MSVC users switching to Clang, no doubt.
And it's clear LLVM can do better - yeah, when "f(&x)" is called, Clang should
be describing the location of 'x' without any doubt.

Though I wonder how many false positives ("the value of X is Y" when it's
really Z) MSVC has compared to what the proposed direction/fixes have in Clang.
Might well be that the users also don't expect as many false positives as this
would create in Clang.

> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.

Maybe - though the divergence does make me a touch uncomfortable, it does
amount to sort of an A/B test with different implementations and users.

I'm not sure exactly where the tradeoffs are of the proposed fixes & what the
path to addressing any gaps would be.

Adrian: Where're you at, philosophically, with all this. Are you of the opinion
that no false positive regression is acceptable? Or are some acceptable (with
the acknowledgement that they are bugs that should be fixed and should be
identified, tracked, have a likely path to resolution, etc) to reduce false
negatives ("no location" when there's really an obvious, unambiguous,
unavoidable location that should be reported in the DWARF)?

If you feel like no false positive regression is acceptable, then yeah - I
guess forking the default here with a flag might be the path forward - then as
the gaps in the analysis are filled (tracking when a value moves back into
memory), eventually the flag can be removed/ignored as the only thing it would
be enabling would be the false positives - reintroducing the "unknown values"
when it's truly unknown.

Quuxplusone commented 7 years ago

(In reply to Nico Weber from comment #18)
> One thing where this is hurting us a lot in chrome/win land at the moment is
> minidumps. Minidumps aren't really a thing on non-Windows, which might be
> why expectations here are so different.
>
> On Windows, when chrome (or what have you) crashes, you can download a
> minidump from the crash server, load that into your debugger, and then use
> that to inspect the state of variables. Users are aware that they look at
> optimized debug info. Someone told me "Misleading information is mitigated
> by the fact you can ask the debugger where's it's getting its information
> (dv /v) and if you see stuff in a register, then you know to start
> distrusting the values."

Why? I thought that minidumps have registers and stack, but lack other memory
contents.

>
> With clang, when users ask for the values of things while debugging
> minidumps, they currently almost always get "no answer" instead of "likely
> answer is X". This is very different from how things used to work for them,
> and this makes people unhappy too.

Heh. So we've trained users to expect broken behavior and now they want it back
;) [this is unfortunately common.]

>
> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.

Quuxplusone commented 7 years ago

(In reply to Nico Weber from comment #18)

> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.

I am real uncomfortable with this being an option on the producer of debug info
(clang).  Now the person who builds a program is deciding how the debugger
behaves -- making an assumption about what the debug developer is going to see.
The person debugging a binary may have no idea how the program was compiled.
The person who is compiling the program may not understand the meaning of the
clang flag ("-fpossibly-wrong-but-maybe-really-helpful-debug-info?  Yes
please!") themselves.

If we want to emit possibly-correct location information in parallel to known-
correct location information in the DWARF in a separate list, and add an option
to the consumer (lldb) to use possibly-correct location information (with
default-off), I'd be OK with that approach.  Then the person enabling the use
of this in lldb is responsible for their own results.

I don't think MSVC's behavior is a feature to be emulated, though.  We've had
many cases like this over the years, and in lldb we consider showing incorrect
values to the user as our worst class of bugs.

Quuxplusone commented 7 years ago

(In reply to Jason Molenda from comment #21)
> (In reply to Nico Weber from comment #18)
>
> > Given the different expectations, would adding this behavior behind a
> > (default-off) flag be a short-term option? Then we can improve optimized
> > debug info asynchronously and see if it's possible to get it to be good
> > enough without having to sometimes show invalid information.
>
>
> I am real uncomfortable with this being an option on the producer of debug
> info (clang).  Now the person who builds a program is deciding how the
> debugger behaves -- making an assumption about what the debug developer is
> going to see.  The person debugging a binary may have no idea how the
> program was compiled.  The person who is compiling the program may not
> understand the meaning of the clang flag
> ("-fpossibly-wrong-but-maybe-really-helpful-debug-info?  Yes please!")
> themselves.

We could make it CodeView-only and not touch the DWARF bits. That should
address lldb concerns.

I agree with dblaikie that having two codepaths here long-term seems
undesirable, but coming up with a grand unified scheme is going to take time,
and ideally we wouldn't have to block switching chrome/win to clang on this
work.

Quuxplusone commented 7 years ago

> Adrian: Where're you at, philosophically, with all this. Are you of the
opinion that no false positive regression is acceptable? Or are some acceptable
(with the acknowledgement that they are bugs that should be fixed and should be
identified, tracked, have a likely path to resolution, etc) to reduce false
negatives ("no location" when there's really an obvious, unambiguous,
unavoidable location that should be reported in the DWARF)?

As I said earlier in the thread I strongly believe that we should not knowingly
introduce new false positives (unless there is a way to mark them in the debug
info as not trustworthy and debuggers will actually surface that to the user).
If the user can't trust *some* of the variable values, *every* value reported
by the debugger becomes not trustworthy.

I also think that this discussion is misguided, because I am convinced that
there is a lot of lower-hanging fruit to pick that does not involve watering
down the debug info with not-to-be-trusted information. We have lots of
documented PRs of optimizations dropping dbg.values that can be fixed without
introducing any uncertainty and I think we should focus our attention there.

I am also talking with Reid to work out a plan to fix the issue described in
this PR without introducing new false positives — this will need a little more
design but it is worth taking the time to solve this correctly. We have a lot
of the necessary components already in place.

Quuxplusone commented 7 years ago

Keno wrote an alternative proposal, essentially trying to improve on the inference that instcombine does today: https://gist.github.com/Keno/480b8057df1b7c63c321

I'm proposing that we remove instcombine's conversion of dbg.declare and push that responsibility onto the passes that delete stores.

Quuxplusone commented 7 years ago

Adrian and I sat down and worked out a long term plan to improve things in this area without knowingly introducing any inaccuracies in variable info. For a long time, we've discussed unifying dbg.declare and dbg.value. We can't mix dbg.declare and dbg.value, so to fix this bug, we should standardize everything on dbg.value, and then we don't need instcombine or any other pass to lossily convert dbg.declare to dbg.value. The steps are:

Add a flag to clang to always emit dbg.value+DW_OP_deref for local variables instead of dbg.declare. Maybe call it -gdbg-value? Document the new scheme in SourceLevelDebugging.rst.
Teach all passes that eliminate dead stores (primarily DSE & instcombine) to insert dbg.value calls when deleting stores to allocas with a dominating dbg.value+deref. This logic would be factored into a shared utility, like salvageDebugInfo. One subtlety is that we need to insert dbg.value+deref calls at the store that kills the store being eliminated, indicating that the value once again lives in memory.
Test the flag, flip the default when it's ready, remove the flag in the next release.

In the long run, this will make it so we start printing variable locations as assembly comments at -O0, which is a nice side benefit.

Quuxplusone commented 7 years ago

I think we also had on the list:

- Add support for memory locations in LiveDebugValues

We also identified two nice-to-have features (these are things that currently
don't work either):

- insert a dbg.value(undef) at the point where a lifetime.end marker is
generated (or directly in the stack coloring pass)

- let clang generate allocas for indirectly passed function arguments (e.g.,
sret)

We also came to the conclusion that — given the input comes from a well-behaved
frontend, and we shall document what exactly that means* — it should never be
necessary to insert dbg.values for load or call instructions.

*) A well-behaved frontend produces an alloca for a variable and generates a
store to the alloca each time the variable is modified. It further generates a
dbg.value describing the variable in the alloca as the one true location for
the variable. This contract is what allows us to pay attention to dead store
elimination (cf. Reid's second point).

By describing allocas with dbg.value we have the opportunity to get a more
precise start location for variable liveness, but we need to be careful not to
regress the DWARF representation at -O0 to a location list, since this would
prevent the debugger from letting the user assign new values to the variable
(implicit vs. explicit location description). We could represent stack-
allocated variables at -O0 with frame index location + a DW_AT_start_scope
attribute to achieve this.

Quuxplusone commented 7 years ago

instead of dbg.declare. Maybe call it -gdbg-value? Document the new scheme in SourceLevelDebugging.rst.

So you want to give users the option to try it?

I'd rather not introduce temporary user-facing options. Could this just be a -Xclang (-cc1) option?

Quuxplusone commented 7 years ago

(In reply to Adrian Prantl from comment #26)
> - Add support for memory locations in LiveDebugValues

Yep, I realized I forgot to post that while discussing this in person with Zach.

> We also identified two nice-to-have features (these are things that
> currently don't work either):
>
> - insert a dbg.value(undef) at the point where a lifetime.end marker is
> generated (or directly in the stack coloring pass)

It's true, stack coloring needs to add DBG_VALUE MIR instructions on lifetime
end. I haven't actually been able to craft a C test case that forces confusing
stack colorings to occur, so I don't think it's very high priority. (I tried
hard! :)

> - let clang generate allocas for indirectly passed function arguments (e.g.,
> sret)

This works, but I filed https://bugs.llvm.org/show_bug.cgi?id=34227 as an
alternative to that.

> By describing allocas with dbg.value we have the opportunity to get a more
> precise start location for variable liveness, but we need to be careful not
> to regress the DWARF representation at -O0 to a location list, since this
> would prevent the debugger from letting the user assign new values to the
> variable (implicit vs. explicit location description). We could represent
> stack-allocated variables at -O0 with frame index location + a
> DW_AT_start_scope attribute to achieve this.

Indeed, preserving the -O0 experience is important.

(In reply to Hal Finkel from comment #27)
> So you want to give users the option to try it?

Yes. We don't have a lot of good tools for evaluating debug info quality beyond
deploying changes to users and giving them flags to flip. =/ Adrian said he's
trying to upstream his tool to calculate DW_AT_location coverage, but it's
still an approximation of quality.

> I'd rather not introduce temporary user-facing options. Could this just be a
> -Xclang (-cc1) option?

I'd prefer it if users didn't have to know about -Xclang, but I don't care that
much.

Quuxplusone commented 7 years ago

> I'd rather not introduce temporary user-facing options. Could this just be a
> -Xclang (-cc1) option?

I'd prefer it if users didn't have to know about -Xclang, but I don't care that
much.

Since the point of the option is to enable incremental development of the
feature until it is ready to become the default, users ideally will never have
to interact with the option, because they won't want to turn it on until it's
ready for prime time. Driver options are expensive compared to cc1 options,
because we typically have to support them indefinitely, so making this
(temporary!) option a cc1 option seems right to me.

Quuxplusone commented 7 years ago

Sorry for being late to the party, I've been away a couple of weeks.

There's always a tension between providing "truthful" info and
"potentially useful" info.  IMO the way to build users' trust in
the value of debug info (especially optimized) is to provide truthful
info.  Half-truths make the debugger into an unreliable tool.
(Personally I use debuggers only as a last resort.)

P.S. Using instructions to convey debug info is a poor design, period.
It causes all kinds of other problems because people writing passes
rarely think about the effect on debug info, and we've fixed a bunch
of cases where instructions/uses/etc were counted inappropriately.
I'd rather be thinking about a design that eliminates all dbg.*
instructions.

Quuxplusone commented 7 years ago

(In reply to Paul Robinson from comment #30)
> P.S. Using instructions to convey debug info is a poor design, period.
> It causes all kinds of other problems because people writing passes
> rarely think about the effect on debug info, and we've fixed a bunch
> of cases where instructions/uses/etc were counted inappropriately.
> I'd rather be thinking about a design that eliminates all dbg.*
> instructions.

I agree, but fundamentally we need to represent some placeholder in the
instruction stream. I think without radically changing the MI and IR basic
block representations, that means we represent variable tracking info with
elements in the linked list, regardless of how we name them. The best we can do
is probably to just add filtering instruction iterators that skip dbg
instructions.

The use-list issues seem like they have been fixed since 2014, when we added
ValueAsMetadata / MetadataAsValue, right? I guess this might still be an issue
in MIR.

Quuxplusone commented 7 years ago

Filtering iterators helps.  We still have cases of -g affecting codegen
but all the really egregious cases have been taken care of, at this point.

Eliminating dbg.* instructions is probably a topic for a BoF or something.
I'll just point out that when I first looked at dbg.declare, it seemed
that it added exactly one item of info to what alloca already had, and
that really didn't seem worth having a whole separate instruction.
Another kind of metadata on alloca should have worked just as well.

Quuxplusone commented 4 years ago

Some experimental improvemnts to lowerDbgDeclare brought
this issue to our interest recently, initially looking at issue
https://bugs.llvm.org/show_bug.cgi?id=45667.

Messges #25 & and #26 on this Bug (34136) outline a decent plan to
begin improving debuginfo for the escaped variables.

We intend to start addressing this issue, is anyone currently working on this,
are there any objections to our now implementing improvements?

Quuxplusone commented 4 years ago

> We intend to start addressing this issue, is anyone currently working on
> this,
> are there any objections to our now implementing improvements?

Sounds great. I'm not aware of anyone working on it from our side.

Quuxplusone commented 3 years ago

I have just reported an issue which seems relevant to this thread: https://llvm.org/PR48719.

The bug report shows how we can run into incorrect debug info for an escaped variable when promoting or partially promoting it introduces PHI instructions, combined with SimplifyCFG squashing the PHI block and its preds together.