facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

BOLT only failed with --hugify #297

Open pingzhaozz opened 2 years ago

pingzhaozz commented 2 years ago

The problem I met only happens when I use --hugify.

I have a program which works well with BOLT.

llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats

I want to try the --hugify option. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.

$ llvm-bolt ./sample -o ./sample.bolt -data=./workspace/perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify BOLT-INFO: shared object or position-independent executable detected BOLT-INFO: Target architecture: x86_64 BOLT-INFO: BOLT version: c62053979489ccb002efe411c3af059addcb5d7d BOLT-INFO: first alloc address is 0x0 BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000 BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it. BOLT-INFO: enabling relocation mode BOLT-WARNING: disabling -split-eh for shared object BOLT-INFO: enabling lite mode BOLT-INFO: pre-processing profile using branch profile reader BOLT-WARNING: Ignored 0 functions due to cold fragments. BOLT-INFO: 2 out of 13 functions in the binary (15.4%) have non-empty execution profile BOLT-INFO: the input contains 1 (dynamic count : 429) opportunities for macro-fusion optimization. Will fix instances on a hot path. BOLT-INFO: 3 instructions were shortened BOLT-INFO: basic block reordering modified layout of 2 (11.76%) functions BOLT-INFO: UCE removed 0 blocks and 0 bytes of code. BOLT-INFO: splitting separates 135 hot bytes from 124 cold bytes (52.12% of split functions is hot). BOLT-INFO: 1 Functions were reordered by LoopInversionPass BOLT-INFO: hfsort+ reduced the number of chains from 2 to 1 BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

             429 : executed forward branches
             403 : taken forward branches
          718626 : executed backward branches
          718626 : taken backward branches
             441 : executed unconditional branches
             429 : all function calls
               0 : indirect calls
               0 : PLT calls
         4324956 : executed instructions
         3594762 : executed load instructions
         1439014 : executed store instructions
               0 : taken jump table branches
               0 : taken unknown indirect branches
          719496 : total branches
          719470 : taken branches
              26 : non-taken conditional branches
          719029 : taken conditional branches
          719055 : all conditional branches

             429 : executed forward branches (=)
               0 : taken forward branches (-100.0%)
          718626 : executed backward branches (=)
          718626 : taken backward branches (=)
             441 : executed unconditional branches (=)
             429 : all function calls (=)
               0 : indirect calls (=)
               0 : PLT calls (=)
         4325373 : executed instructions (+0.0%)
         3594762 : executed load instructions (=)
         1439014 : executed store instructions (=)
               0 : taken jump table branches (=)
               0 : taken unknown indirect branches (=)
          719496 : total branches (=)
          719067 : taken branches (-0.1%)
             429 : non-taken conditional branches (+1550.0%)
          718626 : taken conditional branches (-0.1%)
          719055 : all conditional branches (=)

BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0. BOLT-INFO: padding code to 0x600000 to accommodate hot text BOLT-INFO: setting _end to 0x6001c8 BOLT-INFO: setting __hot_start to 0x400000 BOLT-INFO: setting __hot_end to 0x4000ba BOLT-INFO: patched build-id (flipped last bit)

But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.

> munmap(0x7ffff7fc6000, 34326)           = 0
> open("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY) = 3
> read(3, "always [madvise] never\n", 256) = 23
> madvise(0x555555800000, 2097152, MADV_HUGEPAGE) = 0
> --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, **si_addr=0x10a0**} ---
> +++ killed by SIGSEGV (core dumped) +++
$ readelf -h sample.bolt
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x601780
  Start of program headers:          2097152 (bytes into file)
  Start of section headers:          6301056 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         14
  Size of section headers:           64 (bytes)
  Number of section headers:         43
  Section header string table index: 41

I have hugepage in my system:

$ cat /proc/meminfo |grep -i hug
AnonHugePages:    272384 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   20000
HugePages_Free:    20000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        40960000 kB

I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!

pingzhaozz commented 2 years ago

Found on LLVM 14.0.1

$llvm-bolt --version
LLVM (http://llvm.org/):
  LLVM version 14.0.1
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: sapphirerapids

BOLT revision c62053979489ccb002efe411c3af059addcb5d7d
  Registered Targets:
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
$ readelf -SW sample.bolt
There are 43 section headers, starting at offset 0x602580:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        0000000000000318 000318 00001c 00   A  0   0  1
  [ 2] .note.gnu.property NOTE            0000000000000338 000338 000020 00   A  0   0  8
  [ 3] .note.gnu.build-id NOTE            0000000000000358 000358 000024 00   A  0   0  4
  [ 4] .note.ABI-tag     NOTE            000000000000037c 00037c 000020 00   A  0   0  4
  [ 5] .gnu.hash         GNU_HASH        00000000000003a0 0003a0 000024 00   A  6   0  8
  [ 6] .dynsym           DYNSYM          00000000000003c8 0003c8 0000d8 18   A  7   1  8
  [ 7] .dynstr           STRTAB          00000000000004a0 0004a0 000090 00   A  0   0  1
  [ 8] .gnu.version      VERSYM          0000000000000530 000530 000012 02   A  6   0  2
  [ 9] .gnu.version_r    VERNEED         0000000000000548 000548 000020 00   A  7   1  8
  [10] .rela.dyn         RELA            0000000000000568 000568 0000c0 18   A  6   0  8
  [11] .rela.plt         RELA            0000000000000628 000628 000048 18  AI  6  24  8
  [12] .init             PROGBITS        0000000000001000 001000 00001b 00  AX  0   0  4
  [13] .plt              PROGBITS        0000000000001020 001020 000040 10  AX  0   0 16
  [14] .plt.got          PROGBITS        0000000000001060 001060 000010 10  AX  0   0 16
  [15] .plt.sec          PROGBITS        0000000000001070 001070 000030 10  AX  0   0 16
  [16] .bolt.org.text    PROGBITS        00000000000010a0 0010a0 000345 00  AX  0   0 16
  [17] .fini             PROGBITS        00000000000013e8 0013e8 00000d 00  AX  0   0  4
  [18] .rodata           PROGBITS        0000000000002000 002000 00001c 00   A  0   0  4
  [19] .bolt.org.eh_frame_hdr PROGBITS        000000000000201c 00201c 000054 00   A  0   0  4
  [20] .bolt.org.eh_frame PROGBITS        0000000000002070 002070 000148 00   A  0   0  8
  [21] .init_array       INIT_ARRAY      0000000000003da8 002da8 000008 08  WA  0   0  8
  [22] .fini_array       FINI_ARRAY      0000000000003db0 002db0 000008 08  WA  0   0  8
  [23] .dynamic          DYNAMIC         0000000000003db8 002db8 0001f0 10  WA  7   0  8
  [24] .got              PROGBITS        0000000000003fa8 002fa8 000058 08  WA  0   0  8
  [25] .data             PROGBITS        0000000000004000 003000 000010 00  WA  0   0  8
  [26] .tm_clone_table   PROGBITS        0000000000004010 003010 000000 00  WA  0   0  8
  [27] .bss              NOBITS          0000000000004010 003010 000008 00  WA  0   0  1
  [28] .text             PROGBITS        0000000000400000 400000 0000ba 00  AX  0   0 2097152
  [29] .text.cold        PROGBITS        00000000004000c0 4000c0 00007c 00  AX  0   0 64
  [30] .eh_frame         PROGBITS        0000000000600000 600000 0001c8 00   A  0   0  8
  [31] .bolt.hugify.entries PROGBITS        0000000000601000 601000 000008 00  WA  0   0 4096
  [32] .rodata.str1.1.bolt.extra.1 PROGBITS        0000000000601008 601008 000017 00   A  0   0  1
  [33] .rodata.str1.8.bolt.extra.1 PROGBITS        0000000000601020 601020 00002c 00   A  0   0  8
  [34] .rodata.cst16.bolt.extra.1 PROGBITS        0000000000601050 601050 000060 00   A  0   0 16
  [35] .eh_frame.bolt.extra.1 PROGBITS        00000000006010b0 6010b0 000084 00   A  0   0  8
  [36] .text.bolt.extra.1 PROGBITS        0000000000601140 601140 000687 00  AX  0   0 16
  [37] .eh_frame_hdr     PROGBITS        00000000006017c8 6017c8 00006c 00   A  0   0  1
  [38] .comment          PROGBITS        0000000000000000 601834 000023 01  MS  0   0  1
  [39] .symtab           SYMTAB          0000000000000000 601858 0006f0 18     40  49  8
  [40] .strtab           STRTAB          0000000000000000 601f48 00026e 00      0   0  1
  [41] .shstrtab         STRTAB          0000000000000000 6021b6 00026f 00      0   0  1
  [42] .note.bolt_info   NOTE            0000000000000000 602425 00012c 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
yavtuk commented 2 years ago

@pingzhaozz can you add simple test to reproduce it?

pingzhaozz commented 2 years ago

@pingzhaozz can you add simple test to reproduce it?

I upload my sample code to: https://github.com/pingzhaozz/perf_sample But it doesn't matter to reproduce this issue whether use this sample or other program.

The BOLT process would be like:

1. Start ./sample 1000
2. Collect Full System perf for 10 seconds (perf record -e cycles:u -j any,u -a -o perf.data – sleep 10)
3. Convert Profile to BOLT format  perf2bolt -p perf.data -o perf.fdata <executable>
4. BOLT the sample binary lvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify
yavtuk commented 2 years ago

@pingzhaozz this error is due to PIE binary, now you can use hugify for no-pie binaries. I will add the support for PIE soon, need some time to prepare the patch

pingzhaozz commented 2 years ago

Thanks for the quick answer.

yavtuk commented 2 years ago

Hello @pingzhaozz, https://reviews.llvm.org/D129107 this patch contains the support of PIE binaries, you can try to use it.

pingzhaozz commented 2 years ago

@yavtuk Cool! I'll take a try and give you feedback next week.

pingzhaozz commented 2 years ago

@yavtuk I checked the patch: https://reviews.llvm.org/D129107 and llvm-bolt works with --hugify now. But I didn't see performance improvement after adding --hugify. Perhaps it's because my application already use hugepage for .text? Could you help look if the hugify.bolt file is correct?

This is without --hugify

$ readelf -SW php-fpm.bolt
There are 38 section headers, starting at offset 0x20f7efc0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00000000000002a8 0002a8 00001c 00   A  0   0  1
  [ 2] .note.gnu.build-id NOTE            00000000000002c4 0002c4 000024 00   A  0   0  4
  [ 3] .note.ABI-tag     NOTE            00000000000002e8 0002e8 000020 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        0000000000000308 000308 0039f8 00   A  5   0  8
  [ 5] .dynsym           DYNSYM          0000000000003d00 003d00 010890 18   A  6   1  8
  [ 6] .dynstr           STRTAB          0000000000014590 014590 00e1a3 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          0000000000022734 022734 00160c 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         0000000000023d40 023d40 000260 00   A  6   6  8
  [ 9] .rela.dyn         RELA            0000000000023fa0 023fa0 0c92d0 18   A  5   0  8
  [10] .rela.plt         RELA            00000000000ed270 0ed270 003930 18  AI  5  23  8
  [11] .init             PROGBITS        0000000000200000 200000 000017 00  AX  0   0  4
  [12] .plt              PROGBITS        0000000000200020 200020 002630 10  AX  0   0 16
  [13] .plt.got          PROGBITS        0000000000202650 202650 000040 08  AX  0   0  8
  [14] .bolt.org.text    PROGBITS        0000000000202690 202690 34d471 00  AX  0   0 16
  [15] .fini             PROGBITS        000000000054fb04 54fb04 000009 00  AX  0   0  4
  [16] .rodata           PROGBITS        0000000000600000 600000 7a07e4 00   A  0   0 32
  [17] .bolt.org.eh_frame_hdr PROGBITS        0000000000da07e4 da07e4 0109c4 00   A  0   0  4
  [18] .bolt.org.eh_frame PROGBITS        0000000000db11a8 db11a8 06c348 00   A  0   0  8
  [19] .init_array       INIT_ARRAY      0000000001166348 f66348 000010 08  WA  0   0  8
  [20] .fini_array       FINI_ARRAY      0000000001166358 f66358 000008 08  WA  0   0  8
  [21] .data.rel.ro      PROGBITS        0000000001166360 f66360 098560 00  WA  0   0 32
  [22] .dynamic          DYNAMIC         00000000011fe8c0 ffe8c0 000260 10  WA  6   0  8
  [23] .got              PROGBITS        00000000011feb20 ffeb20 0014d0 08  WA  0   0  8
  [24] .data             PROGBITS        0000000001200000 1000000 003858 00  WA  0   0 32
  [25] .tm_clone_table   PROGBITS        0000000001203858 1003858 000000 00  WA  0   0  8
  [26] .buffer           PROGBITS        0000000001204000 1004000 1f600000 00  WA  0   0 4096
  [27] .bss              NOBITS          0000000020804000 20604000 020850 00  WA  0   0 32
  [28] .text             PROGBITS        0000000020c00000 20c00000 0cad5c 00  AX  0   0 2097152
  [29] .text.cold        PROGBITS        0000000020ccad80 20ccad80 00213d 00  AX  0   0 64
  [30] .eh_frame         PROGBITS        0000000020e00000 20e00000 07f2b0 00   A  0   0  8
  [31] .rodata.cold      PROGBITS        0000000020e7f2b0 20e7f2b0 001358 00   A  0   0  4
  [32] .eh_frame_hdr     PROGBITS        0000000020e80608 20e80608 01364c 00   A  0   0  1
  [33] .comment          PROGBITS        0000000000000000 20e93c54 000027 01  MS  0   0  1
  [34] .symtab           SYMTAB          0000000000000000 20e93c80 08e3f8 18     35 21455  8
  [35] .strtab           STRTAB          0000000000000000 20f22078 05cc0e 00      0   0  1
  [36] .shstrtab         STRTAB          0000000000000000 20f7ec86 000221 00      0   0  1
  [37] .note.bolt_info   NOTE            0000000000000000 20f7eea7 000100 00      0   0  1

This is with --hugify

$ readelf -SW php-fpm_hugify.bolt
There are 44 section headers, starting at offset 0x20f80780:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00000000000002a8 0002a8 00001c 00   A  0   0  1
  [ 2] .note.gnu.build-id NOTE            00000000000002c4 0002c4 000024 00   A  0   0  4
  [ 3] .note.ABI-tag     NOTE            00000000000002e8 0002e8 000020 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        0000000000000308 000308 0039f8 00   A  5   0  8
  [ 5] .dynsym           DYNSYM          0000000000003d00 003d00 010890 18   A  6   1  8
  [ 6] .dynstr           STRTAB          0000000000014590 014590 00e1a3 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          0000000000022734 022734 00160c 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         0000000000023d40 023d40 000260 00   A  6   6  8
  [ 9] .rela.dyn         RELA            0000000000023fa0 023fa0 0c92d0 18   A  5   0  8
  [10] .rela.plt         RELA            00000000000ed270 0ed270 003930 18  AI  5  23  8
  [11] .init             PROGBITS        0000000000200000 200000 000017 00  AX  0   0  4
  [12] .plt              PROGBITS        0000000000200020 200020 002630 10  AX  0   0 16
  [13] .plt.got          PROGBITS        0000000000202650 202650 000040 08  AX  0   0  8
  [14] .bolt.org.text    PROGBITS        0000000000202690 202690 34d471 00  AX  0   0 16
  [15] .fini             PROGBITS        000000000054fb04 54fb04 000009 00  AX  0   0  4
  [16] .rodata           PROGBITS        0000000000600000 600000 7a07e4 00   A  0   0 32
  [17] .bolt.org.eh_frame_hdr PROGBITS        0000000000da07e4 da07e4 0109c4 00   A  0   0  4
  [18] .bolt.org.eh_frame PROGBITS        0000000000db11a8 db11a8 06c348 00   A  0   0  8
  [19] .init_array       INIT_ARRAY      0000000001166348 f66348 000010 08  WA  0   0  8
  [20] .fini_array       FINI_ARRAY      0000000001166358 f66358 000008 08  WA  0   0  8
  [21] .data.rel.ro      PROGBITS        0000000001166360 f66360 098560 00  WA  0   0 32
  [22] .dynamic          DYNAMIC         00000000011fe8c0 ffe8c0 000260 10  WA  6   0  8
  [23] .got              PROGBITS        00000000011feb20 ffeb20 0014d0 08  WA  0   0  8
  [24] .data             PROGBITS        0000000001200000 1000000 003858 00  WA  0   0 32
  [25] .tm_clone_table   PROGBITS        0000000001203858 1003858 000000 00  WA  0   0  8
  [26] .buffer           PROGBITS        0000000001204000 1004000 1f600000 00  WA  0   0 4096
  [27] .bss              NOBITS          0000000020804000 20604000 020850 00  WA  0   0 32
  [28] .text             PROGBITS        0000000020c00000 20c00000 0cad5c 00  AX  0   0 2097152
  [29] .text.injected    PROGBITS        0000000020ccad80 20ccad80 000005 00  AX  0   0 64
  [30] .text.cold        PROGBITS        0000000020ccadc0 20ccadc0 00213d 00  AX  0   0 64
  [31] .eh_frame         PROGBITS        0000000020e00000 20e00000 07f2b0 00   A  0   0  8
  [32] .rodata.cold      PROGBITS        0000000020e7f2b0 20e7f2b0 001358 00   A  0   0  4
  [33] .bolt.hugify.entries PROGBITS        0000000020e81000 20e81000 000008 00  WA  0   0 4096
  [34] .rodata.str1.1.bolt.extra.1 PROGBITS        0000000020e81008 20e81008 000013 00   A  0   0  1
  [35] .rodata.str1.8.bolt.extra.1 PROGBITS        0000000020e81020 20e81020 00002c 00   A  0   0  8
  [36] .eh_frame.bolt.extra.1 PROGBITS        0000000020e81050 20e81050 00013c 00   A  0   0  8
  [37] .text.bolt.extra.1 PROGBITS        0000000020e81190 20e81190 000b66 00  AX  0   0 16
  [38] .eh_frame_hdr     PROGBITS        0000000020e81cf8 20e81cf8 01364c 00   A  0   0  1
  [39] .comment          PROGBITS        0000000000000000 20e95344 000027 01  MS  0   0  1
  [40] .symtab           SYMTAB          0000000000000000 20e95370 08e410 18     41 21456  8
  [41] .strtab           STRTAB          0000000000000000 20f23780 05cc25 00      0   0  1
  [42] .shstrtab         STRTAB          0000000000000000 20f803a5 0002a7 00      0   0  1
  [43] .note.bolt_info   NOTE            0000000000000000 20f8064c 000110 00      0   0  1
yavtuk commented 2 years ago

@pingzhaozz sure, I'll check but give me some time to find my notes

yavtuk commented 2 years ago

@pingzhaozz At first glance section addresses are fine, let’s check the support, starting from the OS level:

BTW you can uncommend the string and rebuild bolt to get more details (hugify.cpp):

// Enables a very verbose logging to stderr useful when debugging // #define ENABLE_DEBUG

https://github.com/llvm/llvm-project/blob/main/bolt/runtime/hugify.cpp#L16

PeterYang12 commented 1 year ago

@yavtuk hello, do you have any update about this patch? Has hugify PIE patch been merged?

yavtuk commented 1 year ago

@PeterYang12 hello, you can find the patch to upstream here: https://reviews.llvm.org/D129107 I am gonna to prepare it for merge in a few days

PeterYang12 commented 1 year ago

I can find the patch on https://reviews.llvm.org/D129107. But I didn't see the patch in https://github.com/llvm/llvm-project/tree/main/bolt . Since our project will clone the project from github. I will keep an eye on your patch recently. Thank you for your info.