Debug Info and Multi-Line statements

Quuxplusone commented 11 years ago


Bugzilla Link	PR15076
Status	NEW
Importance	P enhancement
Reported by	Peter Ohmann (blackcalypso.lists@gmail.com)
Reported on	2013-01-26 10:24:35 -0800
Last modified on	2013-05-29 19:24:52 -0700
Version	unspecified
Hardware	All All
CC	dblaikie@gmail.com, echristo@gmail.com, llvm-bugs@lists.llvm.org, paul.robinson@am.sony.com, paul_robinson@playstation.sony.com, rafael@espindo.la
Fixed by commit(s)
Attachments	`examples.c` (130 bytes, application/octet-stream)
Blocks
Blocked by
See also

Created attachment 9927
The two examples (multi-expression if, do-while with line-break)

Clang does not correctly assign and emit line numbers for individual sub-
expressions of some specific multi-part statements.  I have attached two
functions to illustrate the issue in two different ways.

The first (foo) has a multi-line, multi-expression if statement.  Here, the
bitcode shows both the (x<y) and (y<z) expressions are assigned DebugLocs which
have line number 3 and column number 3 (the location of the "if" token).

The second, stranger, example (bar) shows a do-while statement where the
closing } and the while condition are not on the same line.  Here, all
expressions in the while condition are assigned the line and column number of
the }.  Specifically, in this case, the DebugLoc for the (x<10) expression
lists line number 6.

Note that having the expressions on different lines isn't a required condition,
but simply makes the issue more obvious.

Quuxplusone commented 11 years ago

Attached examples.c (130 bytes, application/octet-stream): The two examples (multi-expression if, do-while with line-break)

Quuxplusone commented 11 years ago

Presumably you don't want the debugger to stop at each subexpression
while you're single-stepping, otherwise you get into some really
egregious situations like the following...

Supposing the example was something like:
  x =
    3 * y +
    2 * z;
Assigning what would intuitively be the "correct" line to each
operation, and single-stepping through it, we would stop at
  line 2 (for 3 * y)
  line 3 (for 2 * z)
  line 2 (for the +)
  line 1 (for the =)
That's a lot of bouncing around for one statement.
And it gets way worse with any optimization that can reorder
those instructions.

If you don't want the debugger to stop at all those places, what is
the advantage to assigning "correct" line numbers to the code generated
for the subexpressions?  Did you have some other use for the debug-line
info, other than controlling debugger behavior?

Quuxplusone commented 11 years ago

(In reply to comment #1)
> Presumably you don't want the debugger to stop at each subexpression
> while you're single-stepping, otherwise you get into some really
> egregious situations like the following...
>
> Supposing the example was something like:
>   x =
>     3 * y +
>     2 * z;
> Assigning what would intuitively be the "correct" line to each
> operation, and single-stepping through it, we would stop at
>   line 2 (for 3 * y)
>   line 3 (for 2 * z)
>   line 2 (for the +)
>   line 1 (for the =)
> That's a lot of bouncing around for one statement.

This is an interesting example that I hadn't thought of.  This is certainly a
good argument against stopping at every subexpression in the general case.

Not knowing the internals very well: Does this also potentially rule out the
validity of the do-while example in the attached?  It seems that that case is
still unusual, as no debug info at all uses the line number of the while
condition.

> Did you have some other use for the debug-line
> info, other than controlling debugger behavior?

In my case, I actually do.  I use it to do over-approximate best-effort
matching to the CFG produced by an outside analysis tool.  I obviously don't
expect design to match my particular use case, though.  I'm most concerned,
then, about those cases that also cause somewhat confusing behavior in a
debugger.

Quuxplusone commented 11 years ago

(In reply to comment #2)
> Not knowing the internals very
> well: Does this also potentially rule out the validity of the do-while
> example in the attached?  It seems that that case is still unusual, as no
> debug info at all uses the line number of the while condition.

A compiler will pick the location of something associated with the
end of the loop, that's easy for the compiler developer to find
and (in the developer's opinion) not obviously wrong.

Clang decided to use the closing brace, for whatever reason;
an old-ish gcc uses the terminating semicolon.  There are reasonable
arguments for other choices, such as the 'while' keyword or the
open-paren or the first expression token after the open-paren.
Whether it's worth changing is mostly based on your opinion of how
likely it is to have the while condition broken up like that, because
that's the only time the actual debugging experience would be affected.

I haven't investigated whether it would be easy to tweak Clang's choice.

Quuxplusone / LLVMBugzillaTest

Debug Info and Multi-Line statements #15076