Open pingzhaozz opened 2 years ago
Found on LLVM 14.0.1
$llvm-bolt --version
LLVM (http://llvm.org/):
LLVM version 14.0.1
Optimized build with assertions.
Default target: x86_64-unknown-linux-gnu
Host CPU: sapphirerapids
BOLT revision c62053979489ccb002efe411c3af059addcb5d7d
Registered Targets:
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
$ readelf -SW sample.bolt
There are 43 section headers, starting at offset 0x602580:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000000318 000318 00001c 00 A 0 0 1
[ 2] .note.gnu.property NOTE 0000000000000338 000338 000020 00 A 0 0 8
[ 3] .note.gnu.build-id NOTE 0000000000000358 000358 000024 00 A 0 0 4
[ 4] .note.ABI-tag NOTE 000000000000037c 00037c 000020 00 A 0 0 4
[ 5] .gnu.hash GNU_HASH 00000000000003a0 0003a0 000024 00 A 6 0 8
[ 6] .dynsym DYNSYM 00000000000003c8 0003c8 0000d8 18 A 7 1 8
[ 7] .dynstr STRTAB 00000000000004a0 0004a0 000090 00 A 0 0 1
[ 8] .gnu.version VERSYM 0000000000000530 000530 000012 02 A 6 0 2
[ 9] .gnu.version_r VERNEED 0000000000000548 000548 000020 00 A 7 1 8
[10] .rela.dyn RELA 0000000000000568 000568 0000c0 18 A 6 0 8
[11] .rela.plt RELA 0000000000000628 000628 000048 18 AI 6 24 8
[12] .init PROGBITS 0000000000001000 001000 00001b 00 AX 0 0 4
[13] .plt PROGBITS 0000000000001020 001020 000040 10 AX 0 0 16
[14] .plt.got PROGBITS 0000000000001060 001060 000010 10 AX 0 0 16
[15] .plt.sec PROGBITS 0000000000001070 001070 000030 10 AX 0 0 16
[16] .bolt.org.text PROGBITS 00000000000010a0 0010a0 000345 00 AX 0 0 16
[17] .fini PROGBITS 00000000000013e8 0013e8 00000d 00 AX 0 0 4
[18] .rodata PROGBITS 0000000000002000 002000 00001c 00 A 0 0 4
[19] .bolt.org.eh_frame_hdr PROGBITS 000000000000201c 00201c 000054 00 A 0 0 4
[20] .bolt.org.eh_frame PROGBITS 0000000000002070 002070 000148 00 A 0 0 8
[21] .init_array INIT_ARRAY 0000000000003da8 002da8 000008 08 WA 0 0 8
[22] .fini_array FINI_ARRAY 0000000000003db0 002db0 000008 08 WA 0 0 8
[23] .dynamic DYNAMIC 0000000000003db8 002db8 0001f0 10 WA 7 0 8
[24] .got PROGBITS 0000000000003fa8 002fa8 000058 08 WA 0 0 8
[25] .data PROGBITS 0000000000004000 003000 000010 00 WA 0 0 8
[26] .tm_clone_table PROGBITS 0000000000004010 003010 000000 00 WA 0 0 8
[27] .bss NOBITS 0000000000004010 003010 000008 00 WA 0 0 1
[28] .text PROGBITS 0000000000400000 400000 0000ba 00 AX 0 0 2097152
[29] .text.cold PROGBITS 00000000004000c0 4000c0 00007c 00 AX 0 0 64
[30] .eh_frame PROGBITS 0000000000600000 600000 0001c8 00 A 0 0 8
[31] .bolt.hugify.entries PROGBITS 0000000000601000 601000 000008 00 WA 0 0 4096
[32] .rodata.str1.1.bolt.extra.1 PROGBITS 0000000000601008 601008 000017 00 A 0 0 1
[33] .rodata.str1.8.bolt.extra.1 PROGBITS 0000000000601020 601020 00002c 00 A 0 0 8
[34] .rodata.cst16.bolt.extra.1 PROGBITS 0000000000601050 601050 000060 00 A 0 0 16
[35] .eh_frame.bolt.extra.1 PROGBITS 00000000006010b0 6010b0 000084 00 A 0 0 8
[36] .text.bolt.extra.1 PROGBITS 0000000000601140 601140 000687 00 AX 0 0 16
[37] .eh_frame_hdr PROGBITS 00000000006017c8 6017c8 00006c 00 A 0 0 1
[38] .comment PROGBITS 0000000000000000 601834 000023 01 MS 0 0 1
[39] .symtab SYMTAB 0000000000000000 601858 0006f0 18 40 49 8
[40] .strtab STRTAB 0000000000000000 601f48 00026e 00 0 0 1
[41] .shstrtab STRTAB 0000000000000000 6021b6 00026f 00 0 0 1
[42] .note.bolt_info NOTE 0000000000000000 602425 00012c 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
@pingzhaozz can you add simple test to reproduce it?
@pingzhaozz can you add simple test to reproduce it?
I upload my sample code to: https://github.com/pingzhaozz/perf_sample But it doesn't matter to reproduce this issue whether use this sample or other program.
The BOLT process would be like:
1. Start ./sample 1000
2. Collect Full System perf for 10 seconds (perf record -e cycles:u -j any,u -a -o perf.data – sleep 10)
3. Convert Profile to BOLT format perf2bolt -p perf.data -o perf.fdata <executable>
4. BOLT the sample binary lvm-bolt <executable> -o <executable>.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats --hugify
@pingzhaozz this error is due to PIE binary, now you can use hugify for no-pie binaries. I will add the support for PIE soon, need some time to prepare the patch
Thanks for the quick answer.
Hello @pingzhaozz, https://reviews.llvm.org/D129107 this patch contains the support of PIE binaries, you can try to use it.
@yavtuk Cool! I'll take a try and give you feedback next week.
@yavtuk I checked the patch: https://reviews.llvm.org/D129107 and llvm-bolt works with --hugify
now. But I didn't see performance improvement after adding --hugify
. Perhaps it's because my application already use hugepage for .text? Could you help look if the hugify.bolt file is correct?
This is without --hugify
$ readelf -SW php-fpm.bolt
There are 38 section headers, starting at offset 0x20f7efc0:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1
[ 2] .note.gnu.build-id NOTE 00000000000002c4 0002c4 000024 00 A 0 0 4
[ 3] .note.ABI-tag NOTE 00000000000002e8 0002e8 000020 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000308 000308 0039f8 00 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000003d00 003d00 010890 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000014590 014590 00e1a3 00 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000022734 022734 00160c 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000023d40 023d40 000260 00 A 6 6 8
[ 9] .rela.dyn RELA 0000000000023fa0 023fa0 0c92d0 18 A 5 0 8
[10] .rela.plt RELA 00000000000ed270 0ed270 003930 18 AI 5 23 8
[11] .init PROGBITS 0000000000200000 200000 000017 00 AX 0 0 4
[12] .plt PROGBITS 0000000000200020 200020 002630 10 AX 0 0 16
[13] .plt.got PROGBITS 0000000000202650 202650 000040 08 AX 0 0 8
[14] .bolt.org.text PROGBITS 0000000000202690 202690 34d471 00 AX 0 0 16
[15] .fini PROGBITS 000000000054fb04 54fb04 000009 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000600000 600000 7a07e4 00 A 0 0 32
[17] .bolt.org.eh_frame_hdr PROGBITS 0000000000da07e4 da07e4 0109c4 00 A 0 0 4
[18] .bolt.org.eh_frame PROGBITS 0000000000db11a8 db11a8 06c348 00 A 0 0 8
[19] .init_array INIT_ARRAY 0000000001166348 f66348 000010 08 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000001166358 f66358 000008 08 WA 0 0 8
[21] .data.rel.ro PROGBITS 0000000001166360 f66360 098560 00 WA 0 0 32
[22] .dynamic DYNAMIC 00000000011fe8c0 ffe8c0 000260 10 WA 6 0 8
[23] .got PROGBITS 00000000011feb20 ffeb20 0014d0 08 WA 0 0 8
[24] .data PROGBITS 0000000001200000 1000000 003858 00 WA 0 0 32
[25] .tm_clone_table PROGBITS 0000000001203858 1003858 000000 00 WA 0 0 8
[26] .buffer PROGBITS 0000000001204000 1004000 1f600000 00 WA 0 0 4096
[27] .bss NOBITS 0000000020804000 20604000 020850 00 WA 0 0 32
[28] .text PROGBITS 0000000020c00000 20c00000 0cad5c 00 AX 0 0 2097152
[29] .text.cold PROGBITS 0000000020ccad80 20ccad80 00213d 00 AX 0 0 64
[30] .eh_frame PROGBITS 0000000020e00000 20e00000 07f2b0 00 A 0 0 8
[31] .rodata.cold PROGBITS 0000000020e7f2b0 20e7f2b0 001358 00 A 0 0 4
[32] .eh_frame_hdr PROGBITS 0000000020e80608 20e80608 01364c 00 A 0 0 1
[33] .comment PROGBITS 0000000000000000 20e93c54 000027 01 MS 0 0 1
[34] .symtab SYMTAB 0000000000000000 20e93c80 08e3f8 18 35 21455 8
[35] .strtab STRTAB 0000000000000000 20f22078 05cc0e 00 0 0 1
[36] .shstrtab STRTAB 0000000000000000 20f7ec86 000221 00 0 0 1
[37] .note.bolt_info NOTE 0000000000000000 20f7eea7 000100 00 0 0 1
This is with --hugify
$ readelf -SW php-fpm_hugify.bolt
There are 44 section headers, starting at offset 0x20f80780:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1
[ 2] .note.gnu.build-id NOTE 00000000000002c4 0002c4 000024 00 A 0 0 4
[ 3] .note.ABI-tag NOTE 00000000000002e8 0002e8 000020 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000308 000308 0039f8 00 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000003d00 003d00 010890 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000014590 014590 00e1a3 00 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000022734 022734 00160c 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000023d40 023d40 000260 00 A 6 6 8
[ 9] .rela.dyn RELA 0000000000023fa0 023fa0 0c92d0 18 A 5 0 8
[10] .rela.plt RELA 00000000000ed270 0ed270 003930 18 AI 5 23 8
[11] .init PROGBITS 0000000000200000 200000 000017 00 AX 0 0 4
[12] .plt PROGBITS 0000000000200020 200020 002630 10 AX 0 0 16
[13] .plt.got PROGBITS 0000000000202650 202650 000040 08 AX 0 0 8
[14] .bolt.org.text PROGBITS 0000000000202690 202690 34d471 00 AX 0 0 16
[15] .fini PROGBITS 000000000054fb04 54fb04 000009 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000600000 600000 7a07e4 00 A 0 0 32
[17] .bolt.org.eh_frame_hdr PROGBITS 0000000000da07e4 da07e4 0109c4 00 A 0 0 4
[18] .bolt.org.eh_frame PROGBITS 0000000000db11a8 db11a8 06c348 00 A 0 0 8
[19] .init_array INIT_ARRAY 0000000001166348 f66348 000010 08 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000001166358 f66358 000008 08 WA 0 0 8
[21] .data.rel.ro PROGBITS 0000000001166360 f66360 098560 00 WA 0 0 32
[22] .dynamic DYNAMIC 00000000011fe8c0 ffe8c0 000260 10 WA 6 0 8
[23] .got PROGBITS 00000000011feb20 ffeb20 0014d0 08 WA 0 0 8
[24] .data PROGBITS 0000000001200000 1000000 003858 00 WA 0 0 32
[25] .tm_clone_table PROGBITS 0000000001203858 1003858 000000 00 WA 0 0 8
[26] .buffer PROGBITS 0000000001204000 1004000 1f600000 00 WA 0 0 4096
[27] .bss NOBITS 0000000020804000 20604000 020850 00 WA 0 0 32
[28] .text PROGBITS 0000000020c00000 20c00000 0cad5c 00 AX 0 0 2097152
[29] .text.injected PROGBITS 0000000020ccad80 20ccad80 000005 00 AX 0 0 64
[30] .text.cold PROGBITS 0000000020ccadc0 20ccadc0 00213d 00 AX 0 0 64
[31] .eh_frame PROGBITS 0000000020e00000 20e00000 07f2b0 00 A 0 0 8
[32] .rodata.cold PROGBITS 0000000020e7f2b0 20e7f2b0 001358 00 A 0 0 4
[33] .bolt.hugify.entries PROGBITS 0000000020e81000 20e81000 000008 00 WA 0 0 4096
[34] .rodata.str1.1.bolt.extra.1 PROGBITS 0000000020e81008 20e81008 000013 00 A 0 0 1
[35] .rodata.str1.8.bolt.extra.1 PROGBITS 0000000020e81020 20e81020 00002c 00 A 0 0 8
[36] .eh_frame.bolt.extra.1 PROGBITS 0000000020e81050 20e81050 00013c 00 A 0 0 8
[37] .text.bolt.extra.1 PROGBITS 0000000020e81190 20e81190 000b66 00 AX 0 0 16
[38] .eh_frame_hdr PROGBITS 0000000020e81cf8 20e81cf8 01364c 00 A 0 0 1
[39] .comment PROGBITS 0000000000000000 20e95344 000027 01 MS 0 0 1
[40] .symtab SYMTAB 0000000000000000 20e95370 08e410 18 41 21456 8
[41] .strtab STRTAB 0000000000000000 20f23780 05cc25 00 0 0 1
[42] .shstrtab STRTAB 0000000000000000 20f803a5 0002a7 00 0 0 1
[43] .note.bolt_info NOTE 0000000000000000 20f8064c 000110 00 0 0 1
@pingzhaozz sure, I'll check but give me some time to find my notes
@pingzhaozz At first glance section addresses are fine, let’s check the support, starting from the OS level:
Linux kernel needs to be built with the configuration options CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y or CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y
cat /sys/kernel/mm/transparent_hugepage/enabled (has to be always or madvise)
check a list available number of pages (hugeadm --pool-list) if there is no available huge pages then set it via sysctl, for example sudo sysctl -w vm.nr_hugepages=512
run your test and check that the number of available huge pages has decreased
BTW you can uncommend the string and rebuild bolt to get more details (hugify.cpp):
// Enables a very verbose logging to stderr useful when debugging // #define ENABLE_DEBUG
https://github.com/llvm/llvm-project/blob/main/bolt/runtime/hugify.cpp#L16
@yavtuk hello, do you have any update about this patch? Has hugify PIE patch been merged?
@PeterYang12 hello, you can find the patch to upstream here: https://reviews.llvm.org/D129107 I am gonna to prepare it for merge in a few days
I can find the patch on https://reviews.llvm.org/D129107. But I didn't see the patch in https://github.com/llvm/llvm-project/tree/main/bolt . Since our project will clone the project from github. I will keep an eye on your patch recently. Thank you for your info.
The problem I met only happens when I use
--hugify
.I have a program which works well with BOLT.
I want to try the
--hugify
option. It passed BOLT process. But when I ran it, it will core dump. And problem seems at the entrypoint: (0x10a0 is the disassamble _start address
), but different from readelf header. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.But when I ran it, it will core dump. And problem seems at the entrypoint: (
0x10a0 is the disassamble _start address
), but different from readelf. core dump happens at the _start(0x10a0) with segfault. GDB can't capture since program not start yet.I have hugepage in my system:
I can't find --hugify manual or guide to help debug this issue. If anyone knows this problem, pls help comment. Thanks a lot!