[DebugInfo] Bloated dwarf for single location variable in nested scope

Quuxplusone commented 4 years ago


Bugzilla Link	PR45889
Status	NEW
Importance	P enhancement
Reported by	Orlando Cazalet-Hyams (orlando.hyams@sony.com)
Reported on	2020-05-12 12:41:31 -0700
Last modified on	2020-07-22 06:11:26 -0700
Version	trunk
Hardware	PC Linux
CC	aprantl@apple.com, dblaikie@gmail.com, djordje.todorovic@syrmia.com, jdevlieghere@apple.com, jeremy.morse.llvm@gmail.com, keith.walker@arm.com, llvm-bugs@lists.llvm.org, paul.robinson@am.sony.com, paul_robinson@playstation.sony.com, vsk@apple.com
Fixed by commit(s)
Attachments	`test.mir` (2683 bytes, text/plain)
Blocks
Blocked by
See also

Created attachment 23476
XFAIL llvm-lit test

We get bloated dwarf for a variable location in a nested scope if it also has a
location tracked outside of that scope. I've attached a test case which fails
with llvm built at 72edb7986a8 (12th May 2020).

$ llc -start-after=livedebugvalues --filetype=obj test.mir -o - | llvm-
dwarfdump -
...
DW_TAG_lexical_block
  DW_AT_low_pc  (0x0000000000000002)
  DW_AT_high_pc (0x0000000000000003)

  DW_TAG_variable
    DW_AT_location    (0x00000000:
       [0x0000000000000000, 0x0000000000000002): DW_OP_reg5 RDI
       [0x0000000000000002, 0x0000000000000003): DW_OP_reg0 RAX)
    DW_AT_name  ("local")
...

Despite the fact that we have a location (DW_OP_reg0 RAX) which covers the
entire lexical block [0x0000000000000002, 0x0000000000000003) in which the
variable lives, we emit a location list. I think we could emit this instead:

DW_TAG_lexical_block
  DW_AT_low_pc  (0x0000000000000002)
  DW_AT_high_pc (0x0000000000000003)

  DW_TAG_variable
    DW_AT_location   (DW_OP_reg0 RAX)
    DW_AT_name       ("local")

Quuxplusone commented 4 years ago

Attached test.mir (2683 bytes, text/plain): XFAIL llvm-lit test

Quuxplusone commented 4 years ago

Hi Orlando,

How do we distinguish whether a variable is live or not at this level (let's
say during AsmPrinter/DwarfDebug)?
I think this is hard problem for addressing very late in the pipeline.

Quuxplusone commented 4 years ago

I just wanted say that looking into variable's scope (at this level; very late) may lead us to additional bugs. To trim the location list where the variable is visible is "easy" if the scope is generated properly, but to trim the location list only to locations where variable is alive is hard problem.

Quuxplusone commented 4 years ago

(In reply to Djordje Todorovic from comment #2)
> I just wanted say that looking into variable's scope (at this level; very
> late) may lead us to additional bugs. To trim the location list where the
> variable is visible is "easy" if the scope is generated properly, but to
> trim the location list only to locations where variable is alive is hard
> problem.

Hi Djordje,

My understanding is that LiveDebugValues guarantees that the variable is live in
all the ranges that we have here. I haven't looked into a fix yet but my
(possibly naive) idea was that, given we know all the ranges are valid, we can
just trim them so that they are contained within their scope bounds, and then
rely on validThroughout to decide if the variable covers the entire scope. Does
that sound reasonable?

Quuxplusone commented 4 years ago

(In reply to Djordje Todorovic from comment #1)
> Hi Orlando,
>
> How do we distinguish whether a variable is live or not at this level (let's
> say during AsmPrinter/DwarfDebug)?
> I think this is hard problem for addressing very late in the pipeline.
>
> I just wanted say that looking into variable's scope (at this level; very
> late) may lead us to additional bugs. To trim the location list where the
> variable is visible is "easy" if the scope is generated properly, but to
> trim the location list only to locations where variable is alive is hard
> problem.

(perhaps paraphrasing what Orlando said)

I don't think it's a matter of figuring out where the variable is live - just
that "we know the value of the variable outside its actual scope" so we can
trim those locations down to the scope we know about - reducing the complexity
of the variable location description, and maybe (in this case at least)
avoiding using a loclist, instead being able to use a direct location.

We're not "determining that the variable is live", we're discarding unnecessary
location information.

Fundamental changes in representation of debug locations could make it so it
was impossible to describe locations for a variable outside its scope - but
that's likely out of scope for this bug. Given the representation we have
today, it's easy to imagine trailing locations occurring beyond the end of a
variable's scope (hey, you can still print the value of 'x' even though it's
out of scope - and it's correct, too) - or constant values ending up with
larger leading scopes, though that's a bit more of a brain teaser/might be
buggy in optimizations.

So, I guess my answer is twofold:

1) Yeah, it seems reasonable to me to prune the locations down
2) Might be worth looking at how the RDI [0, 2) location got there in the first
place? Seems a bit questionable/maybe representative of some underlying problem
in debug locations

Orlando: Do you happen to have original source/command line for compiling your
example, rather than the MIR? (so that we could look into (2), finding out
where/how the MIR was produced/if there's things to be fixed there)

Quuxplusone commented 4 years ago

> 2) Might be worth looking at how the RDI [0, 2) location got there in the
> first place? Seems a bit questionable/maybe representative of some underlying
> problem in debug locations

That would be my first question as well.  It could be there for completely
legit reasons, but it's well worth investigating to determine that for sure.

> 1) Yeah, it seems reasonable to me to prune the locations down

I am less happy with arbitrary pruning like this, because it is reducing the
completeness and truthfulness of the debug info; there should be a really
good reason for doing that.  For example in cases where we hoist or sink an
instruction across a basic block boundary, we will erase an instruction's
original source location (reducing completeness and truthfulness) in order
to avoid two poor consequences: erratic single-stepping behavior in a
debugger (bad user experience), and incorrect attribution in a sampling
profiler (loss of optimization effectiveness).

The motive here seems to be debug-info size, which is much less compelling.

If the first part of the location list is a mistake, by all means correct
the mistake!  But if we're pruning because of ideas about scoping, I think
the debugger should choose how to deal with the truth we tell.

Quuxplusone commented 4 years ago

(In reply to Paul Robinson from comment #5)
> > 1) Yeah, it seems reasonable to me to prune the locations down
>
> I am less happy with arbitrary pruning like this, because it is reducing the
> completeness and truthfulness of the debug info; there should be a really
> good reason for doing that.  For example in cases where we hoist or sink an
> instruction across a basic block boundary, we will erase an instruction's
> original source location (reducing completeness and truthfulness) in order
> to avoid two poor consequences: erratic single-stepping behavior in a
> debugger (bad user experience), and incorrect attribution in a sampling
> profiler (loss of optimization effectiveness).
>
> The motive here seems to be debug-info size, which is much less compelling.
>
> If the first part of the location list is a mistake, by all means correct
> the mistake!  But if we're pruning because of ideas about scoping, I think
> the debugger should choose how to deal with the truth we tell.

I'm not sure the pruning proposed here would change the completeness or
correctness of the debug info. If it's indeed correct that both rax & rdi
contain "local" throughout [0x2, 0x3), then the before vs. after
DW_AT_locations both seem valid to me.

Quuxplusone commented 4 years ago

(In reply to Paul Robinson from comment #5)
> > 2) Might be worth looking at how the RDI [0, 2) location got there in the
> > first place? Seems a bit questionable/maybe representative of some
underlying
> > problem in debug locations
>
> That would be my first question as well.  It could be there for completely
> legit reasons, but it's well worth investigating to determine that for sure.
>
> > 1) Yeah, it seems reasonable to me to prune the locations down
>
> I am less happy with arbitrary pruning like this, because it is reducing the
> completeness and truthfulness of the debug info;

I disagree pretty significantly here - what is the meaning of the value of a
variable outside its scope? It seems like a contradiction - the variable does
not exist outside its scope (that being the definition of a scope at least in
the languages that have them/that I'm aware of). eg: name lookup: if you have a
variable 'x' in an outer scope, then a variable 'x' shadowing that in an inner
scope - but the inner 'x' has a location outside its own scope... what does
that do to using 'x' in the outer scope? does it find the inner 'x'? Only if
there's no outer 'x'?

> there should be a really
> good reason for doing that.  For example in cases where we hoist or sink an
> instruction across a basic block boundary, we will erase an instruction's
> original source location (reducing completeness and truthfulness) in order
> to avoid two poor consequences: erratic single-stepping behavior in a
> debugger (bad user experience), and incorrect attribution in a sampling
> profiler (loss of optimization effectiveness).
>
> The motive here seems to be debug-info size, which is much less compelling.

It's compelling to me in that I find the extra location descriptions to be
spurious/bogus/meaningless & wasted space.

> If the first part of the location list is a mistake, by all means correct
> the mistake!

To me it seems to be necessarily a mistake - as it doesn't, to me, make any
sense to talk about the value of a variable outside its scope (the variable
doesn't exist outside that scope). eg: a variable with a constant value is
still only valid within the address range of the enclosing scope.

I think there's a reasonable assumption/implication in the DWARF spec that a
location description is limited to its enclosing scope:

DWARFv5 2.6 Location Descriptions:

"[Single location descriptions] are sufficient for describing the location of
any object as long as its lifetime is either static or the same as the lexical
block that owns it, and it does not move during its lifetime."

"Location lists, which are used to describe objects that have a limited
lifetime or change their location during their lifetime. "

I don't think it's reasonably intended that a location list could /extend/ the
validity of a variable outside its enclosing scope, only to make it narrower.

> But if we're pruning because of ideas about scoping, I think
> the debugger should choose how to deal with the truth we tell.

Quuxplusone commented 4 years ago

Thank you all for your insightful comments!

(In reply to David Blaikie from comment #4)
> 2) Might be worth looking at how the RDI [0, 2) location got there in the
first place?
>    Seems a bit questionable/maybe representative of some underlying problem
in debug
>    locations

Seems reasonable to me.

> Orlando: Do you happen to have original source/command line for compiling
> your example, rather than the MIR? (so that we could look into (2), finding
> out where/how the MIR was produced/if there's things to be fixed there)

Sure, you can repro from source like this:

$ clang -O2 -g -c test.c
$ cat test.c
int fun(int p) {
  {
    int local = p;
    return local;
  }
}

It's probably important to mention that this is just something I observed while
trying to create test cases for D79571, so I've created a (very very hacky)
prototype to try and help assess what kind of wins we'd see from this change in
the real world. The results are not immediately exciting, but the prototype
takes a conservative approach. For a clang-3.4 build I get:

    .debug_loc section          2.9% smaller
    total debug section sizes   0.69% smaller
    total file size             0.55% smaller

You can find the code here (warning: it's a mess!):
https://reviews.llvm.org/D79949

Note that I don't do any range trimming; I only remove whole ranges that are
entirely outside of the var's scope. Additionally, it looks like the savings
here are almost entirely from dropping out-of-bounds ranges. I.e. we're not
often, if ever, converting from a range list to a single location. IMO this
suggests that a good/real implementation could see some much greater wins.

N.B. The test attached to this ticket will not pass with the patch applied for
two reasons, both of which you'll be able to see if you read my myriad of
comments scattered through the prototype.

Quuxplusone commented 4 years ago

FWIW, looks like in the IR->MIR/ISel that gets a bit confused:

*** IR Dump After Module Verifier ***
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @fun(i32 returned %p) local_unnamed_addr #0 !dbg !7 {
entry:
  call void @llvm.dbg.value(metadata i32 %p, metadata !12, metadata !DIExpression()), !dbg !15
  call void @llvm.dbg.value(metadata i32 %p, metadata !13, metadata !DIExpression()), !dbg !16
  ret i32 %p, !dbg !17
}
# *** IR Dump After Finalize ISel and expand pseudo-instructions ***:
# Machine code for function fun: IsSSA, TracksLiveness
Function Live Ins: $edi in %0

bb.0.entry:
  liveins: $edi
  DBG_VALUE $edi, $noreg, !"p", !DIExpression(), debug-location !15; loc.c:0 line no:1
  DBG_VALUE $edi, $noreg, !"local", !DIExpression(), debug-location !16; loc.c:0 line no:3
  %0:gr32 = COPY $edi
  DBG_VALUE %0:gr32, $noreg, !"p", !DIExpression(), debug-location !15; loc.c:0 line no:1
  DBG_VALUE %0:gr32, $noreg, !"local", !DIExpression(), debug-location !16; loc.c:0 line no:3
  $eax = COPY %0:gr32, debug-location !17; loc.c:4:5
  DBG_VALUE $eax, $noreg, !"p", !DIExpression(), debug-location !15; loc.c:0 line no:1
  DBG_VALUE $eax, $noreg, !"local", !DIExpression(), debug-location !16; loc.c:0 line no:3
  RET 0, $eax, debug-location !17; loc.c:4:5

So the COPY could, arguably get the !17 debug location, though I imagine
figuring out how to make that happen in full generality won't be entirely
obvious.

One thing I decided to test was "where does that COPY get located? what if we
had another function call (or other code) in this function?"

void f1();
int fun(int p) {
  f1();
  {
    int local = p;
    return local;
  }
}

Does seem to make the debug info less problematic:

0x00000043:     DW_TAG_formal_parameter
                  DW_AT_location        (0x00000000:
                     [0x0000000000000000, 0x0000000000000003): DW_OP_reg5 RDI
                     [0x0000000000000003, 0x000000000000000c): DW_OP_reg3 RBX
                     [0x000000000000000c, 0x000000000000000e): DW_OP_reg0 RAX)
                  DW_AT_name    ("p")
                  DW_AT_decl_file       ("/usr/local/google/home/blaikie/dev/scratch/loc.c")
                  DW_AT_decl_line       (4)
                  DW_AT_type    (0x00000086 "int")

0x00000052:     DW_TAG_lexical_block
                  DW_AT_low_pc  (0x000000000000000a)
                  DW_AT_high_pc (0x000000000000000e)

0x0000005f:       DW_TAG_variable
                    DW_AT_location      (0x00000049:
                       [0x000000000000000a, 0x000000000000000d): DW_OP_reg3 RBX)
                    DW_AT_name  ("local")

But in theory we're missing the value of 'local' from [d, e) (RAX) though we
don'th ave the erroneous leading location because we've done... something to
that register copy that caused it to not be associated with 'local' anymore?

Quuxplusone commented 4 years ago

(In reply to David Blaikie from comment #9)
> FWIW, looks like in the IR->MIR/ISel that gets a bit confused:
>
> *** IR Dump After Module Verifier ***
> ; Function Attrs: norecurse nounwind readnone uwtable
> define dso_local i32 @fun(i32 returned %p) local_unnamed_addr #0 !dbg !7 {
> entry:
>   call void @llvm.dbg.value(metadata i32 %p, metadata !12, metadata
> !DIExpression()), !dbg !15
>   call void @llvm.dbg.value(metadata i32 %p, metadata !13, metadata
> !DIExpression()), !dbg !16
>   ret i32 %p, !dbg !17
> }
> # *** IR Dump After Finalize ISel and expand pseudo-instructions ***:
> # Machine code for function fun: IsSSA, TracksLiveness
> Function Live Ins: $edi in %0
>
> bb.0.entry:
>   liveins: $edi
>   DBG_VALUE $edi, $noreg, !"p", !DIExpression(), debug-location !15; loc.c:0
> line no:1
>   DBG_VALUE $edi, $noreg, !"local", !DIExpression(), debug-location !16;
> loc.c:0 line no:3
>   %0:gr32 = COPY $edi
>   DBG_VALUE %0:gr32, $noreg, !"p", !DIExpression(), debug-location !15;
> loc.c:0 line no:1
>   DBG_VALUE %0:gr32, $noreg, !"local", !DIExpression(), debug-location !16;
> loc.c:0 line no:3
>   $eax = COPY %0:gr32, debug-location !17; loc.c:4:5
>   DBG_VALUE $eax, $noreg, !"p", !DIExpression(), debug-location !15; loc.c:0
> line no:1
>   DBG_VALUE $eax, $noreg, !"local", !DIExpression(), debug-location !16;
> loc.c:0 line no:3
>   RET 0, $eax, debug-location !17; loc.c:4:5
>
>
>
> So the COPY could, arguably get the !17 debug location, though I imagine
> figuring out how to make that happen in full generality won't be entirely
> obvious.
>
> One thing I decided to test was "where does that COPY get located? what if
> we had another function call (or other code) in this function?"
>
> void f1();
> int fun(int p) {
>   f1();
>   {
>     int local = p;
>     return local;
>   }
> }
>
> Does seem to make the debug info less problematic:
>
> 0x00000043:     DW_TAG_formal_parameter
>                   DW_AT_location        (0x00000000:
>                      [0x0000000000000000, 0x0000000000000003): DW_OP_reg5 RDI
>                      [0x0000000000000003, 0x000000000000000c): DW_OP_reg3 RBX
>                      [0x000000000000000c, 0x000000000000000e): DW_OP_reg0 RAX)
>                   DW_AT_name    ("p")
>                   DW_AT_decl_file
> ("/usr/local/google/home/blaikie/dev/scratch/loc.c")
>                   DW_AT_decl_line       (4)
>                   DW_AT_type    (0x00000086 "int")
>
> 0x00000052:     DW_TAG_lexical_block
>                   DW_AT_low_pc  (0x000000000000000a)
>                   DW_AT_high_pc (0x000000000000000e)
>
> 0x0000005f:       DW_TAG_variable
>                     DW_AT_location      (0x00000049:
>                        [0x000000000000000a, 0x000000000000000d): DW_OP_reg3
> RBX)
>                     DW_AT_name  ("local")
>
> But in theory we're missing the value of 'local' from [d, e) (RAX) though we
> don'th ave the erroneous leading location because we've done... something to
> that register copy that caused it to not be associated with 'local' anymore?

IMO the fact that the initial COPY doesn't get a source location is unrelated
(though possibly still an issue). The range issue persists even if we give the
COPY the source location !17 during isel by modifying the code here [1] to pass
in EntryMBB->findDebugLoc(EntryMBB->begin()).

We get the first COPY (%0:gr32 = COPY $edi) from EmitLiveInCopies. The second
($eax = COPY %0:gr32) is generated by SelectionDAGBuilder::visitRet. We add in
the DBG_VALUEs for "p" and "local" in the final stages of SelectionDAG. During
register allocation, the second copy becomes redundant and is removed via
VirtRegRewriter::handleIdentityCopy.

I feel like the problem is that we insert the DBG_VALUE("local") at the top of
the function instead of after the second copy. I'm not sure if it's valid in
the general case, but for my reproducer, if we somehow emitted the
DBG_VALUE("local") after the live-in COPYs, we'd no longer see the additional
range.

Actually digging into this, it looks like SelectionDAGBuilder treats the second
dbg.value as an "argument dbg.value" because the value it uses (%p)
isa<Argument>() here [2]. As a quick hack/test, removing the subsequent call to
EmitFuncArgumentDbgValue does eliminate the out-of-scope variable range by
adding the DBG_VALUE("local") after the first COPY, instead of on entry.

So I think the fix might be to change how we identify "argument dbg.values"
here.

Quuxplusone commented 4 years ago

Sorry, forgot to add my links at the end.

[1] https://github.com/llvm/llvm-
project/blob/master/llvm/lib/CodeGen/MachineRegisterInfo.cpp#L486
[2] https://github.com/llvm/llvm-
project/blob/master/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L1326

Quuxplusone commented 4 years ago

Yeah, if this has anything to do with argument handling, that sounds buggy/worth fixdng.

Quuxplusone commented 4 years ago

(but I also think we should prune the ranges down to no greater than their enclosing scope - though one option there would be to make an IR verifier check that makes it impossible for them to be otherwies, and fix what's probably a lot of bugs (but maybe not) rather than "fixing up" at the backend - like the notorious inlining assertion/validation I added years ago (that all debug locations had to chain up to the disubprogram for the function, if the function had a disubprogram... caught lots of lost inlining/buggy cases, but also was/is a source of ongoing friction))

Quuxplusone commented 4 years ago

Hi David,

(In reply to David Blaikie from comment #12)
> Yeah, if this has anything to do with argument handling, that sounds
> buggy/worth fixdng.

Sorry for the delayed response, I was looking into a fix for this. This bug
triggers because there's a codepath for hoisting "dbg.values which use argument
values, and are for parameters or found in the prologue" [1] to the top of the
function. In our case "found in the prologue" is true. But because the block
scope starts after the live-in copies we end up creating an out-of-scope range.
I think the scope starting after the copies is correct, given that the same
issue occurs when the block is swapped out for an inlined function.

I thought this would be a quick one-or-two-liner but this actually seems to be
a non-trivial fix. There are lot of cases to consider and I don't feel
comfortable making a change here at the moment.

(In reply to David Blaikie from comment #13)
> (but I also think we should prune the ranges down to no greater than their
> enclosing scope - though one option there would be to make an IR verifier
> check that makes it impossible for them to be otherwies, and fix what's
> probably a lot of bugs (but maybe not) rather than "fixing up" at the
> backend - like the notorious inlining assertion/validation I added years ago
> (that all debug locations had to chain up to the disubprogram for the
> function, if the function had a disubprogram... caught lots of lost
> inlining/buggy cases, but also was/is a source of ongoing friction))

I'd be happy to have a go at implementing either of both of those suggestions.
I'm not very familiar with the verifier; what are the implications of adding
this check?

[1] https://github.com/llvm/llvm-
project/blob/master/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L5398

Quuxplusone commented 4 years ago

(In reply to Orlando Cazalet-Hyams from comment #14)
> Hi David,
>
> (In reply to David Blaikie from comment #12)
> > Yeah, if this has anything to do with argument handling, that sounds
> > buggy/worth fixdng.
>
> Sorry for the delayed response, I was looking into a fix for this. This bug
> triggers because there's a codepath for hoisting "dbg.values which use
> argument
> values, and are for parameters or found in the prologue" [1] to the top of
> the
> function. In our case "found in the prologue" is true.

What if we didn't trigger on that condition, but only for parameters? Any idea
what breaks?

> But because the block
> scope starts after the live-in copies we end up creating an out-of-scope
> range.
> I think the scope starting after the copies is correct, given that the same
> issue occurs when the block is swapped out for an inlined function.

I missed/didn't understand this part ^. Yes, I expect the same scope address
range problem to exist for an inlined function the same as it does for an
explicit lexical scope, or any other scope, because they're all handled the
same way - though I'm not sure that helps us determine whether the scope is
correct or not.

> I thought this would be a quick one-or-two-liner but this actually seems to
> be
> a non-trivial fix. There are lot of cases to consider and I don't feel
> comfortable making a change here at the moment.

Fair enough - appreciate all the investigation/discussion, as always.

> (In reply to David Blaikie from comment #13)
> > (but I also think we should prune the ranges down to no greater than their
> > enclosing scope - though one option there would be to make an IR verifier
> > check that makes it impossible for them to be otherwies, and fix what's
> > probably a lot of bugs (but maybe not) rather than "fixing up" at the
> > backend - like the notorious inlining assertion/validation I added years ago
> > (that all debug locations had to chain up to the disubprogram for the
> > function, if the function had a disubprogram... caught lots of lost
> > inlining/buggy cases, but also was/is a source of ongoing friction))
>
> I'd be happy to have a go at implementing either of both of those
> suggestions.
> I'm not very familiar with the verifier; what are the implications of adding
> this check?

I don't know a /great/ deal about the LLVM debug info verifier, Adrian Prantl's
probably the right person to ask.

I think it's most of the code in llvm/lib/IR/Verifier.cpp which has anything to
do with "DI" things.

So you could look at how the DWARF describes this situation (where there's a
variable with scope outside its... scope) and see if there's a way to
describe/test for that condition, and fail the verifier if the condition is met.

>
> [1]
> https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/
> SelectionDAG/SelectionDAGBuilder.cpp#L5398

Quuxplusone commented 4 years ago

(In reply to David Blaikie from comment #15)
> (In reply to Orlando Cazalet-Hyams from comment #14)
> > Hi David,
> >
> > (In reply to David Blaikie from comment #12)
> > > Yeah, if this has anything to do with argument handling, that sounds
> > > buggy/worth fixdng.
> >
> > Sorry for the delayed response, I was looking into a fix for this. This bug
> > triggers because there's a codepath for hoisting "dbg.values which use
> > argument
> > values, and are for parameters or found in the prologue" [1] to the top of
> > the
> > function. In our case "found in the prologue" is true.
>
> What if we didn't trigger on that condition, but only for parameters? Any
> idea what breaks?

IIUC we will lose some coverage (but I'm unsure to what extent). This comment
from the surrounding code sums it up well.
// ArgDbgValues are hoisted to the beginning of the entry block.  So we
// should only emit as ArgDbgValue if the dbg.value intrinsic describes a
// variable that also is a param.
//
// Although, if we are at the top of the entry block already, we can still
// emit using ArgDbgValue. This might catch some situations when the
// dbg.value refers to an argument that isn't used in the entry block, so
// any CopyToReg node would be optimized out and the only way to express
// this DBG_VALUE is by using the physical reg (or FI) as done in this
// method.

> > But because the block
> > scope starts after the live-in copies we end up creating an out-of-scope
> > range.
> > I think the scope starting after the copies is correct, given that the same
> > issue occurs when the block is swapped out for an inlined function.
>
> I missed/didn't understand this part ^. Yes, I expect the same scope address
> range problem to exist for an inlined function the same as it does for an
> explicit lexical scope, or any other scope, because they're all handled the
> same way - though I'm not sure that helps us determine whether the scope is
> correct or not.

Ah, I just muddied the waters here. I was trying to express that the inlined
example more clearly shows that the scope should begin after the prologue.

> > I thought this would be a quick one-or-two-liner but this actually seems to
> > be
> > a non-trivial fix. There are lot of cases to consider and I don't feel
> > comfortable making a change here at the moment.
>
> Fair enough - appreciate all the investigation/discussion, as always.
>
> > (In reply to David Blaikie from comment #13)
> > > (but I also think we should prune the ranges down to no greater than their
> > > enclosing scope - though one option there would be to make an IR verifier
> > > check that makes it impossible for them to be otherwies, and fix what's
> > > probably a lot of bugs (but maybe not) rather than "fixing up" at the
> > > backend - like the notorious inlining assertion/validation I added years
ago
> > > (that all debug locations had to chain up to the disubprogram for the
> > > function, if the function had a disubprogram... caught lots of lost
> > > inlining/buggy cases, but also was/is a source of ongoing friction))
> >
> > I'd be happy to have a go at implementing either of both of those
> > suggestions.
> > I'm not very familiar with the verifier; what are the implications of adding
> > this check?
>
> I don't know a /great/ deal about the LLVM debug info verifier, Adrian
> Prantl's probably the right person to ask.
>
> I think it's most of the code in llvm/lib/IR/Verifier.cpp which has anything
> to do with "DI" things.
>
> So you could look at how the DWARF describes this situation (where there's a
> variable with scope outside its... scope) and see if there's a way to
> describe/test for that condition, and fail the verifier if the condition is
> met.

Thanks, I'll see what I can do. I wonder if I should open another ticket for
that work given that I have found the root cause for this specific reproducer?

> >
> > [1]
> > https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/
> > SelectionDAG/SelectionDAGBuilder.cpp#L5398

Quuxplusone commented 4 years ago

(In reply to David Blaikie from comment #15)
> So you could look at how the DWARF describes this situation (where there's a
> variable with scope outside its... scope) and see if there's a way to
> describe/test for that condition, and fail the verifier if the condition is
> met.

You already quoted the relevant snippet in comment 7; DWARF thinks of scopes
statically and represented by the DIE tree, optionally modified by the
DW_AT_start_scope attribute.  But as a permissive standard, DWARF isn't
going to *forbid* having location descriptions that stray outside a strict
accounting of the set of instructions attributed to an enclosing scope.

As a producer, of course, LLVM can choose to say such straying indicates
Bad Stuff happening, and we can (in principle) write verifier rules that
will tsk at us when a location list does stray.  That would indicate
either poor tracking of the variable, or poor tracking of what instructions
should be attributed to the scope, or a decision to make when something
along the lines of an initializing load gets hoisted outside of its
original static scope (LICM out of a loop, say).

I would resist saying such a tsk is a violation of the DWARF standard,
although it's fine to say it's not the kind of DWARF we want to produce.

Quuxplusone commented 4 years ago

(In reply to Paul Robinson from comment #17)
> (In reply to David Blaikie from comment #15)
> > So you could look at how the DWARF describes this situation (where there's a
> > variable with scope outside its... scope) and see if there's a way to
> > describe/test for that condition, and fail the verifier if the condition is
> > met.
>
> You already quoted the relevant snippet in comment 7; DWARF thinks of scopes
> statically and represented by the DIE tree, optionally modified by the
> DW_AT_start_scope attribute.  But as a permissive standard, DWARF isn't
> going to *forbid* having location descriptions that stray outside a strict
> accounting of the set of instructions attributed to an enclosing scope.

Eh, I think it could - seems more like oversight that intentional permissivity -
 the bit that I quoted says that location lists are for when the variable moves, or has a "limited lifetime". Which sounds to me like its saying when the lifetime of the variable is shorter than the enclosing scope.

But yeah - as its a permissive standard, I'm not about to go and make a firm
argument to the DWARF committee that this should be considered a bug and fixed -
 I don't feel that strongly about it.

> As a producer, of course, LLVM can choose to say such straying indicates
> Bad Stuff happening, and we can (in principle) write verifier rules that
> will tsk at us when a location list does stray.  That would indicate
> either poor tracking of the variable, or poor tracking of what instructions
> should be attributed to the scope, or a decision to make when something
> along the lines of an initializing load gets hoisted outside of its
> original static scope (LICM out of a loop, say).
>
> I would resist saying such a tsk is a violation of the DWARF standard,
> although it's fine to say it's not the kind of DWARF we want to produce.

*nod*

I think llvm-dwarfdump should probably flag this as invalid DWARF as far as
we're concerned (& maybe at least warn on cases where a location list is used
but its scope happens to be exactly the entire length of the enclosing scope of
the variable - as that's wasting DWARF bytes).

But also I think I misspoke - I think my intention was to suggest looking at
the LLVM debug info metadata and see how it managed to describe this situation
and make that invalid LLVM IR and have the LLVM IR/DI verifier fail when it
sees something like that. Sorry for the confusion.

Quuxplusone commented 4 years ago

Update:
This discussion continues in D82129. The patch, which landed as ce6de3747bc,
drops out of scope locations before debug info emission. I'm hesitant to mark
this as resolved because the patch doesn't address the underlying causes. OTOH,
it has been discussed that a 'true fix' probably isn't viable with our current
debug info model.

The feature is on by default, run with the llvm flag -trim-var-locs=false to
disable it.

Quuxplusone / LLVMBugzillaTest

[DebugInfo] Bloated dwarf for single location variable in nested scope #44859