facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 176 forks source link

support for multiple text sections #6

Open danielcdh opened 6 years ago

danielcdh commented 6 years ago

If I have multiple text sections, I saw the following error:

perf2bolt: /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:1383: void llvm::bolt::RewriteInstance::discoverFileObjects(): Assertion `Section && "section for functions must be registered."' failed.

Any ideas how to workaround the issue?

maksfb commented 6 years ago

Thanks for reporting the issue. If multiple text sections are a result of a compiler splitting the code, the workaround is to disable it with -fno-reorder-blocks-and-partition, and let BOLT do the splitting.

danielcdh commented 6 years ago

Thanks for the quick reply. BTW, great to see this finally get open-sourced. Thanks!

Unfortunately the separate section is not created by compiler and cannot easily remove. Any suggestions?

maksfb commented 6 years ago

If functions are not split, then adding the support shouldn't be that difficult. We do process code in sections other than .text, e.g. in .init and .fini. If you can share an output of readelf -e then it might give me a clue on what's happening.

danielcdh commented 6 years ago

Thanks, I managed to remove the section suffix. But got another assertion:

perf2bolt: /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:1726: bool llvm::bolt::BinaryFunction::buildCFG(): Assertion `ToBB && "cannot find BB containing TO branch"' failed.

maksfb commented 6 years ago

This sounds like either PIC or assembly code issue. We are working on adding more diagnostics and improving PIC support. Is the binary input with relocations or without?

danielcdh commented 6 years ago

The binary was not built with PIC or dynamic relocation. Looks like it failed while building CFG?

maksfb commented 6 years ago

Yes. That's typically a symptom of a PIC or an assembly code with an embedded jump table. I have a fix for PIC. I'd like you to try it once it lands. If it doesn't work, I'll ask for more details.

maksfb commented 6 years ago

@danielcdh : could you try the latest version?

danielcdh commented 6 years ago

Thanks! That error is gone, but hit another issue:

perf2bolt: /usr/local/google/home/dehao/bolt/llvm/lib/Support/Unix/Program.inc:312: llvm::sys::ProcessInfo llvm::sys::Wait(const llvm::sys::ProcessInfo&, unsigned int, bool, std::__cxx11::string*): Assertion `PI.Pid && "invalid pid to wait on, process not started?"' failed.

0 0x0000563a8a6a4c3e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/dehao/bolt/llvm/lib/Support/Unix/Signals.inc:398:0

1 0x0000563a8a6a4cd1 PrintStackTraceSignalHandler(void*) /usr/local/google/home/dehao/bolt/llvm/lib/Support/Unix/Signals.inc:462:0

2 0x0000563a8a6a3176 llvm::sys::RunSignalHandlers() /usr/local/google/home/dehao/bolt/llvm/lib/Support/Signals.cpp:49:0

3 0x0000563a8a6a45b3 SignalHandler(int) /usr/local/google/home/dehao/bolt/llvm/lib/Support/Unix/Signals.inc:252:0

4 0x00007efe8ecac0c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0)

5 0x00007efe8d83dfcf gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf)

6 0x00007efe8d83f3fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa)

7 0x00007efe8d836e37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37)

8 0x00007efe8d836ee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2)

9 0x0000563a8a6a259b llvm::sys::Wait(llvm::sys::ProcessInfo const&, unsigned int, bool, std::__cxx11::basic_string<char, std::char_traits, std::allocator >*) /usr/local/google/home/dehao/bolt/llvm/lib/Support/Unix/Program.inc:314:0

10 0x0000563a887e1485 llvm::bolt::DataAggregator::aggregate(llvm::bolt::BinaryContext&, std::map<unsigned long, llvm::bolt::BinaryFunction, std::less, std::allocator<std::pair<unsigned long const, llvm::bolt::BinaryFunction> > >&) /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/DataAggregator.cpp:367:0

11 0x0000563a8885818e llvm::bolt::RewriteInstance::processProfileData() /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:2387:0

12 0x0000563a8884d9a3 operator() /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:967:0

13 0x0000563a8884d9a3 llvm::bolt::RewriteInstance::run()::'lambda'(std::set<unsigned long, std::less, std::allocator > const&)::operator()(std::set<unsigned long, std::less, std::allocator > const&) const (bin/perf2bolt+0x5189a3)

14 0x0000563a8884dcf3 llvm::bolt::RewriteInstance::run() /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:996:0

15 0x0000563a886f821e main /usr/local/google/home/dehao/bolt/llvm/tools/llvm-bolt/src/llvm-bolt.cpp:269:0

16 0x00007efe8d82b2b1 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b1)

17 0x0000563a886f6eaa _start (bin/perf2bolt+0x3c1eaa)

rafaelauler commented 6 years ago

Do you have perf in your PATH? Which version?

danielcdh commented 6 years ago

You are right, added perf in PATH and the problem resolved. And managed to go through the process and create a new binary with bolt. Unfortunately, that binary segfaults immediately when I execute.

rafaelauler commented 6 years ago

Is it a trap or a segfault? Since the LLVM disassembler has problems with recent AVX512 instructions, BOLT will mutate functions that use AVX512 into a trap. We are working on improving AVX512 support.

rafaelauler commented 6 years ago

By the way, BOLT has a safer mode of operation if things are not quite working yet for your binary (either if you use AVX512 or if you have weird assembly-written code that is causing BOLT to fail to read the binary at some parts).

Simply disable relocations in the linker or use -relocs=false in BOLT. This mode is less effective for performance because it does not reorder functions, and tries to reorder basic blocks in place without changing the rest of the binary. It is a far more conservative approach, but can still lead to performance improvements.

However, even in relocation mode (our most aggressive processing where every code in the binary gets rewritten), the binary shouldn't segfault unless there is something very weird happening. Traps can happen for AVX512, though.

danielcdh commented 6 years ago

It's segfault. The binary was not built with avx512. It's also not built with relocation and I have to disable function reordering when invoking llvm-bolt.

Another issue is that llvm-bolt takes ~5 hours to process my 800MB binary and produces a 1.1GB binary, is it expected?

Thanks

rafaelauler commented 6 years ago

If you are suffering with long processing times, you probably have deeply-inlined functions with a lot of basic blocks. For these cases, it's better to use -reorder-blocks=cache instead of -reorder-blocks=cache+. The expected processing time ranges from 2 to 6 minutes for ~100MB binary. If you use -update-debug-info, this time may climb to close to 10 minutes, some cases 20 minutes, depending on the code.

The resulting binary is larger in your case because you are using non-relocation mode. So the original code section is kept the same size, but the blocks are reordered and sometimes functions are split. If split, the cold part of these functions will account for the extra 300MB you are observing.

maksfb commented 6 years ago

@danielcdh: you have a lot of code :) The fact that the binary crashes immediately is probably an indication of something trivial that we are not getting right. If you strip the binary after BOLT, it could be the reason, as we break some assumptions that strip makes.

danielcdh commented 6 years ago

Yeah, it's a large binary :)

The segfault happens without stripping.

We will try debug with some smaller binaries and see if the problem can be reproduced.