facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.52k stars 178 forks source link

BOLT crashes on HHVM (3.28.0-dev) and PHP (7.*) compiled with GCC 8.1.1 #4

Open dstogov opened 6 years ago

dstogov commented 6 years ago

$ bin/perf2bolt -p perf.data -o perf.fdata hhvm PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: Spawning perf-script job to read branch events PERF2BOLT: Spawning perf-script job to read mem events PERF2BOLT: Spawning perf-script job to read tasks BOLT-INFO: Target architecture: x86_64 BOLT-INFO: binary build-id is: 0065b8a9bc97be3aef481fa91e906921710ad80c PERF2BOLT-WARNING: build-id matched a different file name. Using "hhvm-3.28.0-dev" for profile parsing. BOLT-INFO: first alloc address is 0x400000 BOLT-INFO: creating new program header table at address 0x4e00000, offset 0x4a00000 BOLT-INFO: disabling -align-macro-fusion in non-relocation mode perf2bolt: /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:870: llvm::bolt::IndirectBranchType llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, uint64_t): Assertion `JTOffsetCandidates.size() > 2 && "expected more than 2 jump table entries"' failed.

0 0x00000000023ad32c llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:398:22

1 0x00000000023ad3bf PrintStackTraceSignalHandler(void*) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:462:1

2 0x00000000023ab85e llvm::sys::RunSignalHandlers() /home/dmitry/BOLT/llvm/lib/Support/Signals.cpp:49:19

3 0x00000000023acca3 SignalHandler(int) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:252:1

4 0x00007f3de490afc0 __restore_rt (/lib64/libpthread.so.0+0x11fc0)

5 0x00007f3de33e2f2b __GI_raise (/lib64/libc.so.6+0x36f2b)

6 0x00007f3de33cd561 __GI_abort (/lib64/libc.so.6+0x21561)

7 0x00007f3de33cd431 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21431)

8 0x00007f3de33db692 (/lib64/libc.so.6+0x2f692)

9 0x000000000045431b llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, unsigned long) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:869:5

10 0x0000000000456aee llvm::bolt::BinaryFunction::disassemble(llvm::ArrayRef) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:1318:46

11 0x000000000056fbb1 llvm::bolt::RewriteInstance::disassembleFunctions() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:2471:27

12 0x0000000000564a80 operator() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:960:21

13 0x0000000000564a80 llvm::bolt::RewriteInstance::run()::'lambda'(std::set<unsigned long, std::less, std::allocator > const&)::operator()(std::set<unsigned long, std::less, std::allocator > const&) const (bin/perf2bolt+0x564a80)

14 0x0000000000564ddf llvm::bolt::RewriteInstance::run() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:996:21

15 0x0000000000412306 main /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/llvm-bolt.cpp:269:61

$ bin/perf2bolt -p perf.data -o perf.fdata php-cgi PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: Spawning perf-script job to read branch events PERF2BOLT: Spawning perf-script job to read mem events PERF2BOLT: Spawning perf-script job to read tasks BOLT-INFO: Target architecture: x86_64 BOLT-INFO: binary build-id is: 39c6dd4f405658de3fb0528edce84c9ae753ebf7 PERF2BOLT: matched build-id and file name BOLT-INFO: first alloc address is 0x400000 BOLT-INFO: creating new program header table at address 0x1600000, offset 0x1200000 BOLT-INFO: disabling -align-macro-fusion in non-relocation mode perf2bolt: /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/Target/X86/X86MCPlusBuilder.cpp:1035: virtual bool {anonymous}::X86MCPlusBuilder::evaluateX86MemoryOperand(const llvm::MCInst&, unsigned int, int64_t, unsigned int, int64_t, unsigned int*, const llvm::MCExpr**) const: Assertion `DispImm && "DispImm needs to be set"' failed.

0 0x00000000023ad32c llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:398:22

1 0x00000000023ad3bf PrintStackTraceSignalHandler(void*) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:462:1

2 0x00000000023ab85e llvm::sys::RunSignalHandlers() /home/dmitry/BOLT/llvm/lib/Support/Signals.cpp:49:19

3 0x00000000023acca3 SignalHandler(int) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:252:1

4 0x00007fc936dc1fc0 __restore_rt (/lib64/libpthread.so.0+0x11fc0)

5 0x00007fc935899f2b __GI_raise (/lib64/libc.so.6+0x36f2b)

6 0x00007fc935884561 __GI_abort (/lib64/libc.so.6+0x21561)

7 0x00007fc935884431 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21431)

8 0x00007fc935892692 (/lib64/libc.so.6+0x2f692)

9 0x00000000019f17a1 (anonymous namespace)::X86MCPlusBuilder::evaluateX86MemoryOperand(llvm::MCInst const&, unsigned int, long, unsigned int, long, unsigned int*, llvm::MCExpr const**) const /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/Target/X86/X86MCPlusBuilder.cpp:1036:29

10 0x00000000019fb4b0 std::pair<llvm::bolt::IndirectBranchType, llvm::MCInst*> (anonymous namespace)::X86MCPlusBuilder::analyzePICJumpTable<std::reverse_iterator >(std::reverse_iterator, std::reverse_iterator, unsigned short, unsigned short) const /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/Target/X86/X86MCPlusBuilder.cpp:2313:13

11 0x00000000019f79b8 (anonymous namespace)::X86MCPlusBuilder::analyzeIndirectBranch(llvm::MCInst&, llvm::bolt::MCPlusBuilder::InstructionIterator, llvm::bolt::MCPlusBuilder::InstructionIterator, unsigned int, llvm::MCInst&, unsigned int&, unsigned int&, long&, llvm::MCExpr const&, llvm::MCInst*&) const /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/Target/X86/X86MCPlusBuilder.cpp:2406:79

12 0x000000000045344b llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, unsigned long) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:695:44

13 0x0000000000456aee llvm::bolt::BinaryFunction::disassemble(llvm::ArrayRef) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:1318:46

14 0x000000000056fbb1 llvm::bolt::RewriteInstance::disassembleFunctions() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:2471:27

15 0x0000000000564a80 operator() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:960:21

16 0x0000000000564a80 llvm::bolt::RewriteInstance::run()::'lambda'(std::set<unsigned long, std::less, std::allocator > const&)::operator()(std::set<unsigned long, std::less, std::allocator > const&) const (bin/perf2bolt+0x564a80)

17 0x0000000000564ddf llvm::bolt::RewriteInstance::run() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:996:21

18 0x0000000000412306 main /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/llvm-bolt.cpp:269:61

HHVM and PHP were built using GCC 8.1.1 on Linux 4.16.15, CPU: i5-2520M

rafaelauler commented 6 years ago

Thanks for reporting, it's surprising to see failures at these points. This definitely deserves to be investigated. Meanwhile, you can try an workaround by using a different GCC version.

dstogov commented 6 years ago

@rafaelauler if I can provide more info (e.g. problematic binary function, its assembly, etc), just let me know how to reach file/function name in GDB.

What GCC version(s) do you support? Can the crash be caused by code in libc or other shared library?

rafaelauler commented 6 years ago

For HHVM open-source, GCC 5.x should work well. Shared libraries are not processed by BOLT, so they shouldn't cause any problems.

However, I took a look at these assertions, and I'm thinking that maybe you have code in your binary compiled with -fPIC. Position independent code is currently not well supported by BOLT, and in general you will get better performance by not using PIC in static code (outside of shared libraries).

This could happen, for example, if you compiled a library with -fPIC and produced a static .a out of those fPIC .o objects, intended to be used for .so.

If it's not PIC, it would be interesting to see where is this coming from. I'll fix the second assertion, so let's see.

maksfb commented 6 years ago

The issue with GCC 8 is that it splits functions by default and non-contiguous function bodies confuse BOLT. The workaround is to add -fno-reorder-blocks-and-partition option to the compilation.

dstogov commented 6 years ago

Rebuilding PHP with -fno-reorder-blocks-and-partition and BOLT update didn't help. I see a similar assertion.

$ perf2bolt -p perf.data -o perf.fdata sapi/cgi/php-cgi PERF2BOLT: Starting data aggregation job for perf.data PERF2BOLT: Spawning perf-script job to read branch events PERF2BOLT: Spawning perf-script job to read mem events PERF2BOLT: Spawning perf-script job to read tasks BOLT-INFO: Target architecture: x86_64 BOLT-INFO: binary build-id is: c1cc8098b47fc05df4412eccfdea9de8707d564b PERF2BOLT: matched build-id and file name BOLT-INFO: first alloc address is 0x400000 BOLT-INFO: creating new program header table at address 0x1600000, offset 0x1200000 BOLT-INFO: disabling -align-macro-fusion in non-relocation mode BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected perf2bolt: /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:871: llvm::bolt::IndirectBranchType llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, uint64_t): Assertion `JTOffsetCandidates.size() > 1 && "expected more than one jump table entry"' failed.

0 0x00000000023b2072 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:398:22

1 0x00000000023b2105 PrintStackTraceSignalHandler(void*) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:462:1

2 0x00000000023b05a4 llvm::sys::RunSignalHandlers() /home/dmitry/BOLT/llvm/lib/Support/Signals.cpp:49:19

3 0x00000000023b19e9 SignalHandler(int) /home/dmitry/BOLT/llvm/lib/Support/Unix/Signals.inc:252:1

4 0x00007fa7f5b7cfc0 __restore_rt (/lib64/libpthread.so.0+0x11fc0)

5 0x00007fa7f4654f2b __GI_raise (/lib64/libc.so.6+0x36f2b)

6 0x00007fa7f463f561 __GI_abort (/lib64/libc.so.6+0x21561)

7 0x00007fa7f463f431 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21431)

8 0x00007fa7f464d692 (/lib64/libc.so.6+0x2f692)

9 0x000000000045462e llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, unsigned long) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:870:5

10 0x0000000000456e7a llvm::bolt::BinaryFunction::disassemble(llvm::ArrayRef) /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:1325:46

11 0x000000000057082d llvm::bolt::RewriteInstance::disassembleFunctions() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:2482:27

12 0x000000000056568a operator() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:967:21

13 0x000000000056568a llvm::bolt::RewriteInstance::run()::'lambda'(std::set<unsigned long, std::less, std::allocator > const&)::operator()(std::set<unsigned long, std::less, std::allocator > const&) const (/home/dmitry/BOLT/build/bin/perf2bolt+0x56568a)

14 0x00000000005659e9 llvm::bolt::RewriteInstance::run() /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/RewriteInstance.cpp:1003:21

15 0x0000000000412306 main /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/llvm-bolt.cpp:269:61

16 0x00007fa7f464118b __libc_start_main (/lib64/libc.so.6+0x2318b)

17 0x0000000000410fba _start (/home/dmitry/BOLT/build/bin/perf2bolt+0x410fba)

Stack dump:

  1. Program arguments: /home/dmitry/BOLT/build/bin/perf2bolt -p perf.data -o perf.fdata sapi/cgi/php-cgi Aborted (core dumped)
maksfb commented 6 years ago

Is there any way we can reproduce your build? If not, I'll have to add more diagnostics to figure out if it's gcc or us making no sense.

dstogov commented 6 years ago

I use Fedora 28 and build PHP cloned from github by system GCC.

I've tried to trace processIndirectBranch() in gdb. It seems assertion is somehow related to branch to __dso_handle symbol. May be this can help...

Breakpoint 4, llvm::bolt::BinaryFunction::processIndirectBranch (this=0x9dad768, Instruction=..., Size=2, Offset=282) at /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:658 658 const auto PtrSize = BC.AsmInfo->getCodePointerSize(); (gdb) n 667 uint64_t ArrayStart = 0; (gdb) 676 uint64_t PCRelAddr = 0; (gdb) 678 auto Begin = Instructions.begin(); (gdb) 679 auto End = Instructions.end(); (gdb) 681 if (BC.isAArch64()) { (gdb) 695 auto Type = BC.MIB->analyzeIndirectBranch(Instruction, (gdb) 704 PCRelBaseInstr); (gdb) 695 auto Type = BC.MIB->analyzeIndirectBranch(Instruction, (gdb) 706 if (Type == IndirectBranchType::UNKNOWN && !MemLocInstr) (gdb) p Type $26 = llvm::bolt::IndirectBranchType::POSSIBLE_PIC_JUMP_TABLE (gdb) n 709 if (MemLocInstr != &Instruction) (gdb) 710 IndexRegNum = 0; (gdb) 712 if (BC.isAArch64()) { (gdb) 746 if (DispExpr) { (gdb) 749 std::tie(TargetSym, TargetOffset) = BC.MIB->getTargetSymbolInfo(DispExpr); (gdb) 750 auto BD = BC.getBinaryDataByName(TargetSym->getName()); (gdb) p TargetSym->getName() $27 = {static npos = 18446744073709551615, Data = 0xc203300 "__dso_handle/1", Length = 14} (gdb) n 751 assert(BD && "global symbol needs a value"); (gdb) 752 ArrayStart = BD->getAddress() + TargetOffset; (gdb) 753 BaseRegNum = 0; (gdb) 754 if (BC.isAArch64()) { (gdb) 762 if (BaseRegNum == BC.MRI->getProgramCounter()) (gdb) 765 DEBUG(dbgs() << "BOLT-DEBUG: addressed memory is 0x" (gdb) 769 if (auto JT = getJumpTableContainingAddress(ArrayStart)) { (gdb) 811 auto Section = BC.getSectionForAddress(ArrayStart); (gdb) 812 if (!Section) { (gdb) 823 if (Section->isVirtual()) { (gdb) 828 StringRef SectionContents = Section->getContents(); (gdb) 830 Type == IndirectBranchType::POSSIBLE_PIC_JUMP_TABLE ? 4 : PtrSize; (gdb) 829 const auto EntrySize = (gdb) 831 DataExtractor DE(SectionContents, BC.AsmInfo->isLittleEndian(), EntrySize); (gdb) 832 auto ValueOffset = static_cast(ArrayStart - Section->getAddress()); (gdb) 833 uint64_t Value = 0; (gdb) 834 std::vector JTOffsetCandidates; (gdb) 835 while (ValueOffset <= Section->getSize() - EntrySize) { (gdb) p ValueOffset $28 = 5628 (gdb) p Section->getSize() $29 = 7848152 (gdb) n 836 DEBUG(dbgs() << "BOLT-DEBUG: indirect jmp at 0x" (gdb) 841 if (BC.isAArch64()) { (gdb) 843 } else if (Type == IndirectBranchType::POSSIBLE_PIC_JUMP_TABLE) { (gdb) 844 Value = ArrayStart + DE.getSigned(&ValueOffset, 4); (gdb) 848 DEBUG(dbgs() << ", which contains value " (gdb) 850 if (containsAddress(Value) && Value != getAddress()) { (gdb) 861 if (Value == getAddress() + getSize()) { (gdb) 865 break; (gdb) 868 if (Type == IndirectBranchType::POSSIBLE_JUMP_TABLE || (gdb) 870 assert(JTOffsetCandidates.size() > 1 && (gdb) perf2bolt: /home/dmitry/BOLT/llvm/tools/llvm-bolt/src/BinaryFunction.cpp:871: llvm::bolt::IndirectBranchType llvm::bolt::BinaryFunction::processIndirectBranch(llvm::MCInst&, unsigned int, uint64_t): Assertion `JTOffsetCandidates.size() > 1 && "expected more than one jump table entry"' failed.

maksfb commented 6 years ago

Thanks for trying to debug it. The real issue turned out to be GCC 8 not granting -fno-reorder-blocks-and-partition option. I think it only happens when __builtin_expect() is used, but could spread beyond it. We'll have to support split functions on input eventually, and I will keep the issue open till it's done.

ZahraHeydari95 commented 2 years ago

@dstogov Hello I wanted to ask you to explain to me what you finally did to be able to rebuild HHVM with -fno-reorder-blocks-and-partition and -Wl,-q or -Wl,--emit-relocs for BOLT optimization? Thanks

aaupov commented 2 years ago

@dstogov Hello I wanted to ask you to explain to me what you finally did to be able to rebuild HHVM with -fno-reorder-blocks-and-partition and -Wl,-q or -Wl,--emit-relocs for BOLT optimization? Thanks

Hi @ZahraHeydari95, Please follow the instructions in https://docs.hhvm.com/hhvm/installation/building-from-source. Extra cmake flags that I used are the following:

-DENABLE_LD_GOLD=OFF -DCMAKE_EXE_LINKER_FLAGS=-Wl,-q -DCMAKE_C_FLAGS=-fno-reorder-blocks-and-partition -DCMAKE_CXX_FLAGS=-fno-reorder-blocks-and-partition

Disabling gold linker is needed because it doesn't accept -q (--emit-relocs) and -icf at the same time (https://sourceware.org/bugzilla/show_bug.cgi?id=18845), and icf is enabled by default by HHVM.

After cmake step, I built HHVM normally with make hhvm.

This produced an HHVM with relocations preserved:

$ readelf -We hphp/hhvm/hhvm | grep rela.text
  [16] .rela.text        RELA            0000000000000000 143d40c8 377ca60 18   I 201  15  8

There are still some split functions in the input (likely coming from third-party libraries):

$ nm hphp/hhvm/hhvm | grep .cold | head
0000000002c163af t BrotliBuildSimpleHuffmanTable.cold
0000000002c163b9 t BrotliCompressBufferQuality10.cold
0000000002c163c3 t BrotliCompressFragmentTwoPass.cold
0000000002c163a4 t DecodeContextMap.cold
0000000002c16390 t DecodeMetaBlockLength.cold
0000000002c1639a t ReadHuffmanCode.cold
0000000002c88f38 t ZSTD_createCDict_advanced2.cold
...

The resulting binary can be processed by BOLT. Let us know if you run into any crashes.