Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

pass-by-reference arg shows incorrect debuginfo value within loop after LICM #36655

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR37682
Status NEW
Importance P enhancement
Reported by Greg Bedwell (greg.bedwell@sony.com)
Reported on 2018-06-04 14:12:17 -0700
Last modified on 2019-11-18 14:18:50 -0800
Version trunk
Hardware PC Windows NT
CC anastasis.gramm2@gmail.com, aprantl@apple.com, dblaikie@gmail.com, ditaliano@apple.com, ftee@flametop.co.uk, jeremy.morse.llvm@gmail.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com, vsk@apple.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR38777
Versions:

~~~~~~
$ clang --version
clang version 7.0.0 (trunk 333874) (llvm/trunk 333894)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: e:\build\Release\bin

$ lldb --version
lldb version 7.0.0 (https://git.llvm.org/git/lldb.git/ revision
0fbc22d3ed13e6825ddd28e9fdf50e8c7887850f)
  clang revision 333874
  llvm revision 333894
~~~~~~

Repro case:

~~~~~~
$ cat tests\fibonacci\test.cpp
__attribute__((__noinline__))
void Fibonacci(int terms, int& total)
{
  int first = 0;
  int second = 1;

  for (int i = 0; i < terms; ++i)
  {
    int next = first + second; // DexWatch('total', '*(int*)$rdx', '$r10')
    total += first;            // DexWatch('total', '*(int*)$rdx', '$r10')
    first = second;            // DexWatch('total', '*(int*)$rdx', '$r10')
    second = next;             // DexWatch('total', '*(int*)$rdx', '$r10')
  }
}

int main()
{
  int total = 0;
  Fibonacci(6, total);
  return total;
}

~~~~~~

Trace output from DExTer ( http://llvm.org/devmtg/2018-04/slides/Bedwell-
Measuring_the_User_Debugging_Experience.pdf ) with -opt-bisect-limit=31.  Each
line represents stepping once in the debugger.  The values in the {} are the
results of evaluating those expressions in the debugger while paused at that
step.

We can see that variable total which is a function argument that has been
passed by reference lives in the memory pointed at by $rdx, so the values
match.  Ignore $r10 for now.

~~~~~~
$ py -3 dexter.py test --builder clang --debugger lldb tests\fibonacci -v --
cflags="-O2 -g -mllvm -opt-bisect-limit=31" --ldflags="-g"

## BEGIN ##
[1, "main", "test.cpp", 18, 7, "BREAKPOINT", "FUNC", {}]
[2, "main", "test.cpp", 19, 3, "BREAKPOINT", "FORWARD", {}]
.   [3, "void Fibonacci(int,int &)", "test.cpp", 7, 21, "BREAKPOINT", "FUNC",
{}]
.   [4, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [5, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [6, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [7, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [8, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [9, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [10, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "1", "*(int*)$rdx": "1", "$r10": "2"}]
.   [11, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "1", "*(int*)$rdx": "1", "$r10": "3"}]
.   [12, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [13, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "2", "*(int*)$rdx": "2", "$r10": "3"}]
.   [14, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "2", "*(int*)$rdx": "2", "$r10": "5"}]
.   [15, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [16, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "4", "*(int*)$rdx": "4", "$r10": "5"}]
.   [17, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "4", "*(int*)$rdx": "4", "$r10": "8"}]
.   [18, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [19, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "7", "*(int*)$rdx": "7", "$r10": "8"}]
.   [20, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "7", "*(int*)$rdx": "7", "$r10": "13"}]
.   [21, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [22, "void Fibonacci(int,int &)", "test.cpp", 14, 1, "BREAKPOINT",
"FORWARD", {}]
[23, "main", "test.cpp", 20, 10, "BREAKPOINT", "FUNC", {}]
## END (23 steps) ##

~~~~~~

Trace output from DExTer with -opt-bisect-limit=32 (BISECT: running pass (32)
Loop Invariant Code Motion on loop).

Within the loop, variable 'total' now lives in $r10 and is only stored back to
the address in $rdx after the loop terminates.  The debug info for total still
appears to be pointing at address of $rdx throughout though, so we get an
incorrect value of 0.

~~~~~~
$ py -3 dexter.py test --builder clang --debugger lldb tests\fibonacci -v --
cflags="-O2 -g -mllvm -opt-bisect-limit=32" --ldflags="-g"

## BEGIN ##
[1, "main", "test.cpp", 18, 7, "BREAKPOINT", "FUNC", {}]
[2, "main", "test.cpp", 19, 3, "BREAKPOINT", "FORWARD", {}]
.   [3, "void Fibonacci(int,int &)", "test.cpp", 7, 21, "BREAKPOINT", "FUNC",
{}]
.   [4, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [5, "void Fibonacci(int,int &)", "test.cpp", 7, 3, "STEP", "BACKWARD", {}]
.   [6, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [7, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [8, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [9, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [10, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [11, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [12, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [13, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [14, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [15, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [16, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [17, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [18, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "4"}]
.   [19, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "4"}]
.   [20, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [21, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT",
"FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "7"}]
.   [22, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "7"}]
.   [23, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [24, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD",
{"total": "0", "*(int*)$rdx": "0", "$r10": "12"}]
.   [25, "void Fibonacci(int,int &)", "test.cpp", 14, 1, "BREAKPOINT",
"FORWARD", {}]
[26, "main", "test.cpp", 20, 10, "BREAKPOINT", "FUNC", {}]
## END (26 steps) ##

~~~~~~

Even though "total" comes in as a reference I think it would be fair to set the
location of total to the register throughout the loop.  At the very least I'd
expect to see a "variable not visible due to optimization" type message.
Quuxplusone commented 6 years ago

I don't think it's feasible to assert that the debugger show updated values for 'total' within this loop, because it amounts to a demand that LICM never happen. As applied to this loop, the whole point of LICM is to sink the store.

Even though "total" comes in as a reference I think it would be fair to set the location of total to the register throughout the loop.

I think this would be misleading. Inspecting "total" should show the user the value pointed-to by the "total" pointer. This is the only way print a correct value when "total" is aliased.

At the very least I'd expect to see a "variable not visible due to optimization" type message.

This might create scenarios where the variable is never presented as available, even though the debugger is perfectly able to inspect it, because hoisting/sinking occurs frequently.

I also think this would be misleading, because users expect any inspection of "total" to show the value pointed-to by the "total" pointer.

Quuxplusone commented 6 years ago

That seems eminently reasonable so I think we'll have to resolve this as it is, but it still leaves me uneasy. Putting myself in the shoes of some user who has written this loop and is trying to debug it, but doesn't know/care anything about compiler optimizations, the debugger is just plain wrong.

This is an unfortunate case where we're comparing pretty unfavourably with the Microsoft compiler in terms of quality of the optimized debugging experience. Looking into it more closely it seems the reason that they do so well is because they have a store inside the loop (tested with VS2017 at /Ox).

It's a shame that there's no way in DWARF (... I assume ... paging Paul ;) ) to represent this sort of situation where we know where the value according to the semantics of the user code being stepped through, but that it has not yet been stored to the location of the pointer. Feature request for a future standard?

I'll put this in the bucket of things to assess the impact of disabling for -Og mode. For reference, GCC 5.4.0 has the same -O2 behaviour as clang here, but the value of total is visible on every iteration with -Og.

Quuxplusone commented 6 years ago
> It's a shame that there's no way in DWARF (... I assume ... paging Paul ;) )
to represent this sort of situation where we know where the value according to
the semantics of the user code being stepped through, but that it has not yet
been stored to the location of the pointer.

Drive-by comment without having read the context: DWARF has
DW_OP_implicit_pointer:

> An optimizing compiler may eliminate a pointer, while still retaining the
value that
> the pointer addressed. DW_OP_implicit_pointer allows a producer to describe
this
> value.
> The DW_OP_implicit_pointer operation specifies that the object is a pointer
> that cannot be represented as a real pointer, even though the value it would
> point to can be described.
Quuxplusone commented 6 years ago
> Within the loop, variable 'total' now lives in $r10 and is only stored
> back to the address in $rdx after the loop terminates.

That's not really stating the situation accurately. 'total' is a reference,
not an int; it's just that syntactically there's no difference within the
loop.  If you declared 'total' as 'int *' I don't think the confusion would
occur.

The *value* of the pointed-to variable is being computed in the loop, and
not actually stored until it terminates.  There are lots of cases where we
might be computing something and not actually update a variable until later.

As Adrian points out, in DWARF v5 we could describe this more accurately.
In earlier versions, we cannot, because there's no way to describe a
pointer to a register.

FTR, I am sympathetic to the idea that '-Og' should avoid code-motion
optimizations.  Not just for cases like this, but because it messes with
single-stepping something awful.
Quuxplusone commented 6 years ago
(In reply to Paul Robinson from comment #4)
> > Within the loop, variable 'total' now lives in $r10 and is only stored
> > back to the address in $rdx after the loop terminates.
>
> That's not really stating the situation accurately. 'total' is a reference,
> not an int; it's just that syntactically there's no difference within the
> loop.  If you declared 'total' as 'int *' I don't think the confusion would
> occur.
>

Right.  I'm trying to come at this problem by putting aside all of my knowledge
of compiler internals and just think about the experience of the average user
who doesn't care to drop down to stepping through disassembly and just expects
source level stepping to work.  In this case, from the user point of view I can
happily step through the loop and see some of my variables being updated, but
not this one in particular.  With my other compiler for my other platform I can
see all my variables being updated on each iteration.  Therefore I prefer to
debug on that other platform.

I guess my question at this stage is whether:
  a) would it be desirable
  b) would it be feasible
to have the debugger report something like "the actual value of the reference
is 0, but I've currently got a value of 7 sitting in a register waiting to be
stored there"?
Quuxplusone commented 6 years ago
(In reply to Greg Bedwell from comment #5)
> I guess my question at this stage is whether:
>   a) would it be desirable
>   b) would it be feasible
> to have the debugger report something like "the actual value of the
> reference is 0, but I've currently got a value of 7 sitting in a register
> waiting to be stored there"?

It would obviously improve the debugging experience in this kind of situation,
but it would take more expressive power than DWARF has to describe it.
Quuxplusone commented 6 years ago
(In reply to comment #1)
> > Even though "total" comes in as a reference I think it would be fair to
> > set the location of total to the register throughout the loop.
>
> I think this would be misleading. Inspecting "total" should show the user
> the value pointed-to by the "total" pointer. This is the only way print a
> correct value when "total" is aliased.
>

(In reply to comment #1)
> I also think this would be misleading, because users expect any inspection
> of "total" to show the value pointed-to by the "total" pointer.

(In reply to comment #4)
> That's not really stating the situation accurately. 'total' is a
> reference, not an int; it's just that syntactically there's no difference
> within the loop.  If you declared 'total' as 'int *' I don't think the
> confusion would occur.

I've been thinking about this recently, and the more I think about it, the more
I'm not sure that I agree with this entirely.  My concern comes down to that
fact that regardless of the underlying compiler implementation it's a
reference, not a pointer:

"It is neither a pointer to the object, nor a copy of the object. It is the
object." - https://isocpp.org/wiki/faq/references#overview-refs

So, from a pure user perspective, I still think that there's an argument to be
made that for a few instructions within the loop the location of 'total' (and
any aliases of it) is a register, not an address.  If total had come into the
function as int* I'd agree entirely with the above points (but still arguing in
favour of a way to represent 'pending' values to the debugger).

It may well be that we have to accept this as a limitation of the LLVM
representation, but I'm leaning towards disagreeing with the assessment that it
would be incorrect behaviour to temporarily bind the debug location of total to
the register, as opposed to reasonable behaviour that may be not feasible to
properly model.
Quuxplusone commented 6 years ago
Hmmm.

If 'total' was a global 'int' variable instead of an 'int&' parameter,
and the generated code was effectively the same (e.g. when compiled as
PIC), I think there would be less disagreement about the appropriateness
of identifying the location as in the register, for the course of those
few instructions.

So, I'm becoming more convinced that Greg has a point, and the problem
is mainly that we can't describe it correctly prior to DWARF v5.