facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

[eh_frame]received signal SIGSEGV when throwing an exception #278

Open HShan886 opened 2 years ago

HShan886 commented 2 years ago

When using latest bolt, my program received signal SIGSEGV when throwing an exception. And I upgrade to binutils-2.32, it stills crash.

Here is backtrace:

#0  read_encoded_value_with_base (encoding=encoding@entry=131 '\203', base=0, p=0x10bccc2 "", p@entry=0x10bccbe "\004\030", val=val@entry=0x7fffffffc438)
    at /tmp/gcc-4.9.2/libstdc++-v3/../libgcc/unwind-pe.h:265
#1  0x00007ffff7af1a61 in read_encoded_value (val=0x7fffffffc438, p=0x10bccbe "\004\030", encoding=131 '\203', context=0x0)
    at /tmp/gcc-4.9.2/libstdc++-v3/../libgcc/unwind-pe.h:284
#2  __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=1, exception_class=<optimized out>, ue_header=0x2d78060, context=0x7fffffffc5b0)
    at ../../.././libstdc++-v3/libsupc++/eh_personality.cc:488
#3  0x00007ffff6b94263 in _Unwind_RaiseException (exc=0x2d78060) at ../.././libgcc/unwind.inc:113
#4  0x00007ffff6b9456d in _Unwind_Resume_or_Rethrow (exc=0x2d78060) at ../.././libgcc/unwind.inc:252
#5  0x00007ffff7af23e9 in __cxxabiv1::__cxa_rethrow () at ../../.././libstdc++-v3/libsupc++/eh_throw.cc:118

Compare original program with bolted program, encoding in original program is 1, but 131 in bolted program. Moreover, many lsda information were deleted in .eh_frame section of bolted program.

Any help will be pleasure.

aaupov commented 2 years ago

Hi @Haishan312, Can you please add a BOLT log? What is the compiler the program has been compiled with? I remember seeing this function as split in gcc libstdc++.

HShan886 commented 2 years ago

@aaupov thank you for replying

my program was compiled with gcc-4.9.2. and here is bolt.log bolt.log

aaupov commented 2 years ago

@Haishan312,

Thanks for the log. Cold fragments are out of question. @maksfb, @rafaelauler: any suggestions on how to track down/workaround possible LSDA corruption?

rafaelauler commented 2 years ago

The toolchain (gcc-4.9.2) is quite old. Upgrading it should probably solve this. But I'm curious on why BOLT is messing up eh_frame. I would start by running BOLT without any optimization flags, without profile, -relocs=1 -lite=0, just to check that the program fails even when no code is changed. If it does not, then I would start to zero-in which function is causing the problem after being optimized and which optimization is causing problems. If it does, then I would run a debugger to print which function is throwing, get its name and run -print-only=funcname -print-finalized to dump how BOLT is reading this function, paying particular attention to CFIs. If CFIs are being ingested, are they correct?

HShan886 commented 2 years ago

@rafaelauler thank you firstly. my program run ok, when I add external -lite=0 into recommended optimization options, like: llvm-bolt test -o test.elf -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort -split-functions=2 -split-all-cold -split-eh -dyno-stats -lite=0 -v=2 Would the -lite=0 reduce the performance?

rafaelauler commented 2 years ago

Hi @Haishan312, you're very welcome, thanks for reporting. -lite=0 will actually cause BOLT to take more time and memory to optimize the program because it is rewriting the entire program instead of just the important functions. If -lite=1 produces a faulty program but -lite=0 does not, then this is a lite mode bug. I'm interested in investigating this, let me know if you can reproduce this bug in a binary that you can share.

HShan886 commented 2 years ago

@rafaelauler this binary is very large. I cannot upload the binary, Sorry. Could you share your suggestion for fix-up this bug?