Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

No inline debug info generated for small functions when optimizing #25629

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR25630
Status NEW
Importance P normal
Reported by Diego Novillo (dnovillo@google.com)
Reported on 2015-11-24 15:49:15 -0800
Last modified on 2019-05-15 03:24:59 -0700
Version trunk
Hardware PC All
CC aprantl@apple.com, danielcdh@gmail.com, dblaikie@gmail.com, ditaliano@apple.com, echristo@gmail.com, hfinkel@anl.gov, jsweval@arxan.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com, quentin.colombet@gmail.com, sanjoy@playingwithpointers.com, stephen.tozer@sony.com
Fixed by commit(s)
Attachments bug.cc (254 bytes, text/plain)
no-bug.cc (296 bytes, application/octet-stream)
patatucciolo.ll (7839 bytes, application/octet-stream)
Blocks PR31268
Blocked by
See also
Created attachment 15349
Test case

For one liner functions, LLVM seems to not be emitting inlined-at information
in the final binary.  The inlinedAt tags are there in the bytecode, but don't
show up in the final binary.

If I change the code of the inlined function to have more than one line, then
the inlinedAt markers survive to the final binary.

$ clang++ -O2 -g bug.cc -o bug
$ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine

When compiled with GCC, the inlined function is marked properly:

$ g++-4.9 -O2 -g bug.cc -o gcc-bug
$ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine
0x00000476:       DW_TAG_inlined_subroutine [21] *
                    DW_AT_abstract_origin [DW_FORM_ref4]        (cu + 0x028d => {0x000003da} "_Z3sumxx")
                    DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559)
                    DW_AT_high_pc [DW_FORM_data8]       (0x0000000000000003)
                    DW_AT_call_file [DW_FORM_data1]     ("./bug.cc")
                    DW_AT_call_line [DW_FORM_data1]     (8)
                    DW_AT_GNU_discriminator [DW_FORM_data1]     (0x02)

If I modified the inlined function to have more code, LLVM will generate a
DW_TAG_inlined_subroutine tag.  I did this in the no-bug.cc test case:

$ clang++ -g -O2 no-bug.cc -o no-bug
$ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine
0x000000b1:     DW_TAG_inlined_subroutine [9] *
                  DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x004c => {0x0000004c} "_Z3sumxx")
                  DW_AT_low_pc [DW_FORM_addr]   (0x000000000040063e)
                  DW_AT_high_pc [DW_FORM_data4] (0x00000010)
                  DW_AT_call_file [DW_FORM_data1]       ("./no-bug.cc")
                  DW_AT_call_line [DW_FORM_data1]       (13)
                  DW_AT_GNU_discriminator [DW_FORM_data1]       (0x01)
Quuxplusone commented 8 years ago

Attached bug.cc (254 bytes, text/plain): Test case

Quuxplusone commented 8 years ago

Attached no-bug.cc (296 bytes, application/octet-stream): The same test case with more code in the inlined function

Quuxplusone commented 8 years ago

This is likely a debug location optimization problem. It would affect inlining or without inlining.

If anyone wants to look into this, just watch which optimization removes the last use of the DebugLoc with the inlinedAt info. Chances are two instructions, one inside and one outside the inlined function, might get coalesced into some other operation and the outer instruction's location is chosen. Then we might split it back out (there was a bug filed recently about switch lowering doing something like this, maybe?) into multiple instructions in the backend without the extra information available anymore.

I'll see if I can reduce this a bit further...

Quuxplusone commented 8 years ago

With some changes this seemed to repro at -O1 for me. Still get an add instruction that looks like it's sufficiently similar to the add in the inlined function that it should be annotated as such.

In the O1 IR, the add instruction appears to still have the right location. So I guess this is somewhere down in selection dag or other backend-y optimizations.

Quuxplusone commented 7 years ago
Slightly simpler testcase, removing the dependency on printf

int sum(int x, int y) { return x + y; }

int main() {
  int s, i = 0;
  while (i++ < 400)
    if (i != 100) s = sum(i, s); else s = 30;
  return s;
}
Quuxplusone commented 7 years ago
It's definitely a problem in the backend.
Fully reduced testcase:

define i32 @main() !dbg !3 {
  %add.i.4 = add nsw i32 undef, undef, !dbg !6
  %.add.i.4 = select i1 undef, i32 30, i32 %add.i.4, !dbg !9
  ret i32 %.add.i.4, !dbg !10
}
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}
!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer:
"clang version 4.0.0 (trunk 288665) (llvm/trunk 288759)", isOptimized: false,
runtimeVersion: 0, emissionKind: LineTablesOnly)
!1 = !DIFile(filename: "bug.cc", directory: "/home/davide/work/llvm/build-
clang/bin")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 3, type:
!4, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped,
isOptimized: false, unit: !0)
!4 = !DISubroutineType(types: !5)
!5 = !{}
!6 = !DILocation(line: 1, column: 34, scope: !7, inlinedAt: !8)
!7 = distinct !DISubprogram(name: "sum", scope: !1, file: !1, line: 1, type:
!4, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped,
isOptimized: false, unit: !0)
!8 = distinct !DILocation(line: 6, column: 23, scope: !3)
!9 = !DILocation(line: 6, column: 9, scope: !3)
!10 = !DILocation(line: 7, column: 3, scope: !3)
Quuxplusone commented 7 years ago
$ cat run.sh
#!/bin/sh

rm -rf bugopt bugopt.o
echo "Result at O0"
/home/davide/work/llvm/build-clang/bin/llc -O0 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info
bugopt.o |grep -A6 DW_TAG_inlined_subroutine

echo "Result at 01"
rm -rf bugopt bugopt.o
/home/davide/work/llvm/build-clang/bin/llc -O1 $1 -o bugopt
/home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o
/home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info
bugopt.o |grep -A6 DW_TAG_inlined_subroutine

$ ./run.sh reduced.ll
Result at O0
0x00000040:     DW_TAG_inlined_subroutine [4]
                  DW_AT_abstract_origin [DW_FORM_ref4]  (cu + 0x002a => {0x0000002a} "sum")
                  DW_AT_low_pc [DW_FORM_addr]   (0x0000000000000005)
                  DW_AT_high_pc [DW_FORM_data4] (0x00000002)
                  DW_AT_call_file [DW_FORM_data1]       ("/home/davide/work/llvm/build-clang/bin/bug.cc")
                  DW_AT_call_line [DW_FORM_data1]       (6)

Result at 01
$

Taking this, trying to understand which backend pass destroys the information.
Quuxplusone commented 7 years ago

I have a theory now. Please ignore my previous reduction, I reduced the testcase too much. After another (painful) reduction, I think I identified the pass which introduces the problem (see new attached testcase), that is LoopStrengthReduce (which is run as part of llc -O1 and beyond). Before LoopStrengthReduce runs the IR looks like:

%add.i = add nsw i32 %inc5, %s.04, !dbg !32 tail call void @llvm.dbg.value(metadata i32 %add.i, i64 0, metadata !22, metadata !11), !dbg !23 %.add.i = select i1 %cmp1, i32 30, i32 %add.i, !dbg !33

where !32 = !DILocation(line: 1, column: 34, scope: !6, inlinedAt: !29) !33 = !DILocation(line: 6, column: 9, scope: !17)

So the add instruction contains the inlinedAt info and it points which points to the correct location in my reduced *.cc testcase: !29 = distinct !DILocation(line: 6, column: 23, scope: !30) that is -> sum(i, s)

and the select instruction points to the i inside the if (i != 100)

During LoopStrengthReduce the add instruction is replaced, so the IR looks like this:

tail call void @llvm.dbg.value(metadata !2, i64 0, metadata !22, metadata !11), !dbg !23 %2 = add i32 %.add.i, %lsr.iv, !dbg !32 %3 = add i32 %2, -3, !dbg !32 %.add.i.1 = select i1 %cmp1.1, i32 30, i32 %3, !dbg !32

where !32 = !DILocation(line: 6, column: 9, scope: !17)

so, both the add and the select point to the i inside the if i != 100

Stepping through with the debuggger reveals that when the Expander runs it inserts the add instructions assigning the debug metadata of select (i.e. !33 before LSR, which becomes !32 after renaming the metadata because the original add is dead).

I'm not entirely sure what's the best solution here, so I'm asking Quentin. I think that LSR (or SCEV) should (if possible) look at assigning the correct metadata when creating the new add instructions (and/or preserving the inlineAt info).

Quuxplusone commented 7 years ago

Attached patatucciolo.ll (7839 bytes, application/octet-stream): reduced testcase

Quuxplusone commented 7 years ago

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

Quuxplusone commented 7 years ago
(In reply to comment #9)
> It sounds like a job for the SCEVExpander at first glance.
>
> AFAICT, LSR is not playing with the debug locations and the SCEVExpander
> should have access to the Value, hence the debug location, related to the
> different expressions it expands. Right now, the expander uses the debug
> location of the insertion point instead of the original value, if I am not
> mistaken.

Is it possible to actually get to (the debug location of the) original value
from within SCEVExpander?  I'd think the transform that is requesting an
expansion would have a better idea of what the debug location should be.

> I am not familiar with the internals of SCEV though.
>
> Sanjoy, what's your take on this?
Quuxplusone commented 7 years ago
I'm not familiar with neither LSR or the internals of SCEV, but I spent a
decent amount of time stepping in the debugger today.
So, yes, as Quentin pointed out SCEVExpander::expand() computes an insertion
point for the SCEV object and assigns to the newly inserted instructions the DI
of the instruction correspoding to the insertion point.
I'm not entirely sure if there's an easy way to access the value we're trying
to replace from inside SCEVExpander, but it's available in the caller stack
(LF.OperandValToReplace).

As a quick hack, I tried to force the debug location to be set correctly for
this case and it seems to work, but I'm worried this is too specific to my case.

diff --git a/lib/Transforms/Scalar/LoopStrengthReduce.cpp
b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index a61f646..7019d56 100644
--- a/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -4576,6 +4576,11 @@ Value *LSRInstance::Expand(const LSRUse &LU,
   // unfolded offsets. LSR assumes they both live next to their uses.
   if (!Ops.empty()) {
     Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);
+    if (isa<Instruction>(FullV) &&
+        isa<Instruction>(LF.OperandValToReplace)) {
+      (cast<Instruction>(FullV))->setDebugLoc(
+        (cast<Instruction>(LF.OperandValToReplace))->getDebugLoc());
+    }
     Ops.clear();
     Ops.push_back(SE.getUnknown(FullV));
   }

So, I think there are two possible solutions to the problem:
1) LSR updates the debug locations, as my hack does, but maybe in a more
systematic way.
2) expandCodeFor() gets another overload where it takes the value that is
getting replaced (or equivalently its debugloc) and SCEVExpander sets the
correct debug location accordingly.

Does this make sense? Ideas?
Quuxplusone commented 7 years ago
(In reply to comment #11)
> So, I think there are two possible solutions to the problem:
> 1) LSR updates the debug locations, as my hack does, but maybe in a more
> systematic way.
> 2) expandCodeFor() gets another overload where it takes the value that is
> getting replaced (or equivalently its debugloc) and SCEVExpander sets the
> correct debug location accordingly.
>
> Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of (2)
that changes expandCodeFor to take a DebugLoc is the right path forward.
Quuxplusone commented 7 years ago
(In reply to comment #12)
> (In reply to comment #11)
> > So, I think there are two possible solutions to the problem:
> > 1) LSR updates the debug locations, as my hack does, but maybe in a more
> > systematic way.
> > 2) expandCodeFor() gets another overload where it takes the value that is
> > getting replaced (or equivalently its debugloc) and SCEVExpander sets the
> > correct debug location accordingly.
> >
> > Does this make sense? Ideas?
>
> I think we discussed this on IRC some time back, but I think the variant of
> (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

Agreed, #2 seems sensible to me.