KDAB / hotspot

The Linux perf GUI for performance analysis.
4.16k stars 257 forks source link

Inlined function disassembly shows wrong source file. #490

Closed LHolten closed 1 year ago

LHolten commented 1 year ago

Describe the bug The disassembler only shows the entrypoint source file. This could be due to every function getting inlined.

To Reproduce

  1. Clone my rust project https://github.com/LHolten/onitama-solver
  2. Build with rust cargo build --release
  3. Profile /target/release/table_base with argument 4
  4. Open any function in the disassembler

Expected behavior It should show the selected function source code, instead we always get the same file /src/bin/table_base.rs

Version Info (please complete the following information):

Additional context Console output:

Fontconfig warning: "/usr/share/fontconfig/conf.avail/05-reset-dirs-sample.conf", line 6: unknown element "reset-dirs"
kf.coreaddons: "Could not find plugin konsolepart"
kf.coreaddons: "Could not find plugin kgraphviewerpart"
void KDDockWidgets::Config::setFlags(KDDockWidgets::Config::Flags) Only use this function at startup before creating any DockWidget or MainWindow ; These are already created:  ("results") ("summary", "bottomUp", "topDown", "flameGraph", "callerCallee", "disassembly", "frequency", "timeLine") QVector()
unhandled feature PerfHeader::MEM_TOPOLOGY 56
unhandled feature PerfHeader::CACHE 1548
unhandled feature PerfHeader::SAMPLE_TIME 16
unhandled feature PerfHeader::CPU_PMU_CAPS 412
unhandled feature PerfHeader::BPF_BTF 4
unhandled feature PerfHeader::BPF_PROG_INFO 4
feature not properly read PerfHeader::CPU_TOPOLOGY 332 212
Linux version "6.3.3" detected. Switching to automatic buffering.
unhandled event type 79   PERF_RECORD_TIME_CONV
unhandled event type 69   PERF_RECORD_ID_INDEX
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
unhandled event type 82   uknown type

If necessary I can try to make a smaller reproducer. Thank you for making this software btw!

LHolten commented 1 year ago

I have confirmed that marking a function as #[inline(never)] makes hotspot able to find the correct source file

the8472 commented 1 year ago

https://github.com/LHolten/onitama-solver/blob/68b52aa907f730b0431a51607a8ea35d085968a9/Cargo.toml#L8-L9

Have you tried setting debug = 2? I think that helps with disentangling inlining effects.

Edit: Nevermind, I checked and on current rustc debug=1 is sufficient to get proper profiling info

lievenhey commented 1 year ago

Hi, that will sadly be a nofix.

The disassembler runs objdump -l -d -C target/release/table_base --start-address 0xbe40 --stop-address 0xcba9 (start and stop address vary per symbol) and uses the output to try to reconstruct the source code.

The output will normally look like this:

0000000000001040 <main>:
main():
/tmp/test.c:4
    1040:       55                      push   %rbp
/tmp/test.c:10
    1041:       48 8d 2d bc 0f 00 00    lea    0xfbc(%rip),%rbp        # 2004 <_IO_stdin_used+0x4>
    1048:       31 f6                   xor    %esi,%esi
    104a:       31 c0                   xor    %eax,%eax

We then use the first reference and say that this is the start of the function. In your case the compiler inlines a lot of functions, which will result in something like this:

onitama_solver::onitama_simd::update::<impl onitama_solver::onitama_simd::Update>::get_unresolved:
    c0a9:       44 89 f8                mov    %r15d,%eax
    c0ac:       48 03 44 24 28          add    0x28(%rsp),%rax
    c0b1:       48 8b 4c 24 38          mov    0x38(%rsp),%rcx
onitama_solver::onitama_simd::accum_spread::<impl onitama_solver::onitama_simd::Accum>::accumulate:
    c0b6:       0f bb c1                btc    %eax,%ecx
    c0b9:       44 0f bb f9             btc    %r15d,%ecx
    c0bd:       48 8b 54 24 40          mov    0x40(%rsp),%rdx
    c0c2:       44 0f b3 fa             btr    %r15d,%edx
<bit_iter::BitIter<u32> as core::iter::traits::iterator::Iterator>::fold:
/home/lieven/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bit-iter-1.1.1/src/lib.rs:144
    c0c6:       41 89 cc                mov    %ecx,%r12d
    c0c9:       45 31 f6                xor    %r14d,%r14d
    c0cc:       41 09 d4                or     %edx,%r12d
    c0cf:       74 50                   je     c121 <onitama_solver::onitama_simd::update::<impl onitama_solver::onitama_simd::Update>::update_layout+0x2e1>
    c0d1:       31 f6                   xor    %esi,%esi
core::num::<impl u32>::trailing_zeros:
    c0d3:       66 66 66 66 2e 0f 1f    data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
    c0da:       84 00 00 00 00 00 
/rustc/0ab38e95bb1cbf0bd038d359bdecbfa501f003a7/library/core/src/num/mod.rs:1143
    c0e0:       f3 45 0f bc d4          tzcnt  %r12d,%r10d

As you can see onitama_solver::onitama_simd::accum_spread::<impl onitama_solver::onitama_simd::Accum>::accumulate does not have a source code reference, which is a result of inlining.

Since the disassembler uses the first reference it finds as the source file, it may choose the wrong file.

milianw commented 1 year ago

agreed, there's nothing we can do about this I'm afraid. if anyone has a suggestion that doesn't require us to disassemble all build artifacts eagerly then I'm all ear. please reopen then. for now, I'm closing this

LHolten commented 1 year ago

Btw, I do not know how it works, but samply (another profiler) is able to show the source code with line based sample attribution for these inlined functions.

lievenhey commented 1 year ago

ok, I think I found a solution. If I comple test.c

#include <stdio.h>

inline int square(int i) {
        return i * i;
}

int main() {
        for (int i = 0; i < 10000; i++) {
                int sum = 0;
                for (int j = 0; j < i; j++) {
                        sum += square(j);
                }
                printf("Sum: %d\n", sum);
        }
}

with -O3 -g I end up with the following in objdump:

0000000000001040 <main>:
main():
/tmp/test.c:7
    1040:       55                      push   %rbp
/tmp/test.c:13
    1041:       48 8d 2d bc 0f 00 00    lea    0xfbc(%rip),%rbp        # 2004 <_IO_stdin_used+0x4>
    ....
    square():
/tmp/test.c:4
    109b:       66 0f 6f d0             movdqa %xmm0,%xmm2
    109f:       66 0f f4 d0             pmuludq %xmm0,%xmm2
    10a3:       66 0f 73 d0 20          psrlq  $0x20,%xmm0
    10a8:       66 0f f4 c0             pmuludq %xmm0,%xmm0
    10ac:       66 0f 70 d2 08          pshufd $0x8,%xmm2,%xmm2
    10b1:       66 0f 70 c0 08          pshufd $0x8,%xmm0,%xmm0
    10b6:       66 0f 62 d0             punpckldq %xmm0,%xmm2
main():
/tmp/test.c:11 (discriminator 1)
    10ba:       66 0f fe ca             paddd  %xmm2,%xmm1
    ...

dwarfdump gives me:

0x000000f6: DW_TAG_inlined_subroutine
              DW_AT_abstract_origin     (0x00000166 "square")
              DW_AT_entry_pc    (0x000000000000109b)
              DW_AT_GNU_entry_view      (0x01)
              DW_AT_ranges      (0x0000006f
                 [0x000000000000109b, 0x00000000000010ba)
                 [0x00000000000010f0, 0x00000000000010f5))
              DW_AT_call_file   ("/tmp/test.c")
              DW_AT_call_line   (11)
              DW_AT_call_column (0x20)

0x00000166: DW_TAG_subprogram
              DW_AT_external    (true)
              DW_AT_name        ("square")
              DW_AT_decl_file   ("/tmp/test.c")
              DW_AT_decl_line   (3)
              DW_AT_decl_column (0x0c)
              DW_AT_prototyped  (true)
              DW_AT_type        (0x00000054 "int")
              DW_AT_inline      (DW_INL_declared_inlined)

so it may be worth investing some time into proper dwarf parsing.