bombela / backward-cpp

A beautiful stack trace pretty printer for C++
MIT License
3.68k stars 467 forks source link

Hang in signal handler when printing stack trace #265

Closed nmaludy closed 2 years ago

nmaludy commented 2 years ago

Working with backward and in most cases it handles printing our stack traces just fine. However, in one of our applications (for some reason) it seems to hang when resolving the stack trace using libdw. It appears that libdw is allocating some memory causing a deadlock within the signal handler. Below is the backtrace from gdb.

(gdb) bt
#0  0x00007fa52729e09c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fa527317698 in calloc () from /lib64/libc.so.6
#2  0x00007fa526bbe6ef in file_read_elf () from /lib64/libelf.so.1
#3  0x00007fa526bbf04f in __libelf_read_mmaped_file () from /lib64/libelf.so.1
#4  0x00007fa526bbf2ea in read_file () from /lib64/libelf.so.1
#5  0x00007fa528fb8326 in libdw_open_elf () from /lib64/libdw.so.1
#6  0x00007fa528fa9de2 in __libdwfl_getelf () from /lib64/libdw.so.1
#7  0x00007fa528faa090 in find_symtab () from /lib64/libdw.so.1
#8  0x00007fa528faad62 in dwfl_module_getsymtab () from /lib64/libdw.so.1
#9  0x00007fa528fb1810 in __libdwfl_addrsym () from /lib64/libdw.so.1
#10 0x00007fa528fb3053 in dwfl_module_addrinfo () from /lib64/libdw.so.1
#11 0x00007fa528fb1793 in dwfl_module_addrname () from /lib64/libdw.so.1
#12 0x00007fa529203aa1 in backward::TraceResolverLinuxImpl<backward::trace_resolver_tag::libdw>::resolve (this=this@entry=0x6e6cf98, trace=...)
    at xxx/extern/include/backward.hpp:1822
#13 0x00007fa5292068ad in backward::Printer::print_stacktrace<backward::StackTrace> (this=0x6e6cf80, st=..., os=..., colorize=...)
    at xxx/extern/include/backward.hpp:4034
#14 0x00007fa529201bd0 in backward::Printer::print<backward::StackTrace> (os=..., st=..., this=<optimized out>)
    at xxx/extern/include/backward.hpp:4000

Any ideas on what could cause this? I'm happy to help provide info and help fix the issue.

nmaludy commented 2 years ago

FYI looks like this was causing by the signal handler being called recursively from another malloc() call that raised SIGSEGV'd

#36 <signal handler called>
--Type <RET> for more, q to quit, c to continue without paging--
#37 0x00007f6fbf3f0a4f in raise () from /lib64/libc.so.6
#38 0x00007f6fbf3c3db5 in abort () from /lib64/libc.so.6
#39 0x00007f6fbf433057 in __libc_message () from /lib64/libc.so.6
#40 0x00007f6fbf43a1bc in malloc_printerr () from /lib64/libc.so.6
#41 0x00007f6fbf43babc in _int_free () from /lib64/libc.so.6
#42 0x00007f6f8071ade5 in osgeo::proj::common::UnitOfMeasure::~UnitOfMeasure() () from /usr/proj82/lib/libproj.so.22
#43 0x00007f6fbf3f31ec in __run_exit_handlers () from /lib64/libc.so.6
#44 0x00007f6fbf3f3320 in exit () from /lib64/libc.so.6
#45 0x00007f6fbf3dccaa in __libc_start_main () from /lib64/libc.so.6
#46 0x000000000040154e in _start ()

Problem appears to be in the shutdown of the proj library. Not sure why yet.

Closing since it's not a problem with backward-cpp

bombela commented 2 years ago

You either have a memory corruption. Which happens to mutate some malloc data. Or the wrong pointer to free. Or maybe even a double free. The next call to malloc (_int_free()) appears to abort the program. Which most likely leaves the malloc mutex locked behind. Then, when trying to print the trace, libdw calls malloc. Which deadlocks.