facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 176 forks source link

perf2bolt is crashing when optimizing mysql 5.7 #264

Closed fantblue closed 2 years ago

fantblue commented 2 years ago

I was trying to optimize mysql 5.7 with BOLT. But it's crashing.

PERF2BOLT: out of range traces involving unknown regions: 29234470 (9.4%)
perf2bolt: /data/bolt/bolt/lib/Core/BinaryContext.cpp:1102: void llvm::bolt::BinaryContext::registerFragment(llvm::bolt::BinaryFunction&, llvm::bolt::BinaryFunction&) const: Assertion `TargetParent == &Function && "mismatching parent function"' failed.
 #0 0x0000000000d45440 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000000d4332e SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fc3f02be630 __restore_rt sigaction.c:0:0
 #3 0x00007fc3eeeab3d7 raise /usr/src/debug/glibc-2.17-c758a686/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:55:0
 #4 0x00007fc3eeeacac8 abort /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:92:0
 #5 0x00007fc3eeea41a6 __assert_fail_base /usr/src/debug/glibc-2.17-c758a686/assert/assert.c:92:0
 #6 0x00007fc3eeea4252 (/lib64/libc.so.6+0x2f252)
 #7 0x0000000001b6b27a llvm::bolt::BinaryContext::processInterproceduralReferences(llvm::bolt::BinaryFunction&) (/data/bolt_install/bin/perf2bolt+0x1b6b27a)
 #8 0x0000000000b202fb llvm::bolt::RewriteInstance::disassembleFunctions() (/data/bolt_install/bin/perf2bolt+0xb202fb)
 #9 0x0000000000b78de5 llvm::bolt::RewriteInstance::run() (/data/bolt_install/bin/perf2bolt+0xb78de5)
#10 0x000000000040e0e0 main (/data/bolt_install/bin/perf2bolt+0x40e0e0)
#11 0x00007fc3eee97555 __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:300:0
#12 0x00000000004757b5 _start (/data/bolt_install/bin/perf2bolt+0x4757b5)
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /data/bolt_install/bin/perf2bolt /data/rena/Mysql5/bench_mysql/mysqld -p ./perf.data -o mysqld.fdata -w mysqld.yaml

I found similar problem in #263, it suggest to enable gcc flag -fno-reorder-blocks-and-partition.

But it didn't make sense in this issue. Can anyone help to fix this issue?

By the way, I had enable PGO & LTO before trying to bolt the binary.

aaupov commented 2 years ago

It's almost certainly the same underlying problem of GCC split functions/split jump tables. Passing -fno-reorder-blocks-and-partition to the compiler should solve the issue. Make sure that the option propagates to the LTO step as well.

For cases where split functions are coming from statically linked system libraries that can't be easily rebuilt with -fno-reorder-blocks-and-partition, we've added a workaround in BOLT to gracefully handle such binaries (in "[BOLT] Split functions: support fragments with multiple parents", commit bb156d9abbf) Please built the latest version of BOLT and try again. Let me know if it fixes the problem.

valuenumbering commented 2 years ago

Hi @aaupov , it is ok with the latest version! Thanks!

fantblue commented 2 years ago

Hi @aaupov! Thanks for your answer. -fno-reorder-blocks-and-partition and update BOLT to latest version solve the problem.

Here is another BOLT error I came across when I use bolt to optimize libmysqlclient.so. Is it a known problem or can you help to fix it? Thanks!

gcc 10.2.0

llvm-bolt: /data/cyl/bolt_latest/bolt/lib/Rewrite/RewriteInstance.cpp:4996: uint64_t llvm::bolt::RewriteInstance::getNewFunctionAddress(uint64_t): Assertion `!Function->isFragment() && "cannot get new address for a fragment"' failed.
 #0 0x0000000000d4a7a0 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000000d4853e SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f8dcb18b630 __restore_rt sigaction.c:0:0
 #3 0x00007f8dca0c13d7 raise /usr/src/debug/glibc-2.17-c758a686/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:55:0
 #4 0x00007f8dca0c2ac8 abort /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:92:0
 #5 0x00007f8dca0ba1a6 __assert_fail_base /usr/src/debug/glibc-2.17-c758a686/assert/assert.c:92:0
 #6 0x00007f8dca0ba252 (/lib64/libc.so.6+0x2f252)
 #7 0x0000000000b236de (/data/cyl/bolt_install/bin/llvm-bolt+0xb236de)
 #8 0x0000000000b70c35 void llvm::bolt::RewriteInstance::patchELFSectionHeaderTable<llvm::object::ELFType<(llvm::support::endianness)1, true> >(llvm::object::ELFObjectFile<llvm::object::ELFType<(llvm::support::endianness)1, true> >*) (/data/cyl/bolt_install/bin/llvm-bolt+0xb70c35)
 #9 0x0000000000b76ab2 llvm::bolt::RewriteInstance::rewriteFile() (/data/cyl/bolt_install/bin/llvm-bolt+0xb76ab2)
#10 0x000000000040e220 main (/data/cyl/bolt_install/bin/llvm-bolt+0x40e220)
#11 0x00007f8dca0ad555 __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:300:0
#12 0x0000000000476ea5 _start (/data/cyl/bolt_install/bin/llvm-bolt+0x476ea5)
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /data/cyl/bolt_install/bin/llvm-bolt ../library_output_directory/libmysqlclient.so.21.1.22 -b /data/cyl/perf_data/mysql/mysqld.fdata -reorder-functions=hfsort+ -split-functions=3 -reorder-blocks=cache+ -split-all-cold -dyno-stats -icf=1 -use-gnu-stack -o ../library_output_directory/libmysqlclient.so.21.1.22.bolt
aaupov commented 2 years ago

Great that the original issue is fixed. The new one with libmysqlclient.so is not something we've run into before. It's also related to fragments according to the assertion. Can you please share the repro steps to build libmysqlclient.so? (OS, compiler version, mysql commit hash)

fantblue commented 2 years ago

@aaupov , here is my environment

mysql version: 8.0.22. OS : CentOS 7 derivatives kernel: 4.14.105 compiler: gcc 10.2.0

I build it with default configuration. except append -fno-reorder-blocks-and-partition to linker flags.

nhuhuan commented 2 years ago

@fantblue Hi, the issue has been resolved and the fix has been landed to upstream: D128111