Open GoogleCodeExporter opened 9 years ago
I think I'm seeing something similar on arm with current master. Might be issue
with stacktrace grabbing.
May I have backtraces from process that's stuck ?
Also which hw architecture you're running on ?
Original comment by alkondratenko
on 25 Feb 2014 at 9:26
Sorry for not giving details: it's on NetBSD-6.99.32/x86_64.
It seems to get stuck already in the first death test.
gdb debugallocation_test <pid>
shows:
(gdb) thread apply all bt
Thread 1 (LWP 1):
#0 0x00007f7ff603b48a in _sys___nanosleep50 () from /usr/lib/libc.so.12
#1 0x00007f7ff70070dc in __nanosleep50 (rqtp=0x7f7fffffbc90, rmtp=0x0) at
/archive/foreign/src/lib/libpthread/pthread_cancelstub.c:398
#2 0x00007f7ff782af27 in base::internal::SpinLockDelay (w=<optimized out>,
value=<optimized out>, loop=<optimized out>) at
./src/base/spinlock_posix-inl.h:54
#3 0x00007f7ff782ae46 in SpinLock::SlowLock (this=0x7f7ff7a4f560
<tcmalloc::Static::pageheap_lock_>) at src/base/spinlock.cc:133
#4 0x00007f7ff7823b4d in Lock (this=<optimized out>) at src/base/spinlock.h:71
#5 SpinLockHolder (l=<optimized out>, this=<optimized out>) at
src/base/spinlock.h:136
#6 tcmalloc::ThreadCache::InitModule () at src/thread_cache.cc:314
#7 0x00007f7ff782ec97 in tcmalloc::ThreadCache::GetCache () at
src/thread_cache.h:420
#8 0x00007f7ff78369b6 in do_malloc_no_errno (size=20752) at
src/tcmalloc.cc:1102
#9 do_malloc (size=20752) at src/tcmalloc.cc:1109
#10 Allocate (type=-271733872, size=20704) at src/debugallocation.cc:529
#11 DebugAllocate (type=-271733872, size=20704) at src/debugallocation.cc:1015
#12 do_debug_malloc_or_debug_cpp_alloc (size=20704) at
src/debugallocation.cc:1215
#13 tc_malloc (size=20704) at src/debugallocation.cc:1221
#14 0x00007f7ff6407a7f in start_fde_sort (count=<optimized out>,
accu=<optimized out>) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:399
#15 init_object (ob=0x7f7ff6ef6788) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:743
#16 search_object (ob=0x7f7ff6ef6788, pc=0x7f7ff640472e <_Unwind_Backtrace+48>)
at /archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:933
#17 0x00007f7ff640885d in _Unwind_Find_registered_FDE (bases=0x7f7fffffc348,
pc=0x7f7ff640472e <_Unwind_Backtrace+48>) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:997
#18 _Unwind_Find_FDE (pc=0x7f7ff640472e <_Unwind_Backtrace+48>,
bases=0x7f7fffffc348) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde-glibc.c:421
#19 0x00007f7ff6403afa in uw_frame_state_for (context=0x7f7fffffc2a0,
fs=0x7f7fffffbf60) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2.c:1130
#20 0x00007f7ff6403fb2 in uw_init_context_1 (context=0x7f7fffffc2a0,
outer_cfa=0x7f7fffffc3d0, outer_ra=0x7f7ff7400fb8 <backtrace+40>) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2.c:1449
#21 0x00007f7ff640472f in _Unwind_Backtrace (trace=0x7f7ff7400ff0 <tracer>,
trace_argument=0x7f7fffffc3d8) at
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind.inc:283
#22 0x00007f7ff7400fb8 in backtrace (arr=<optimized out>, len=<optimized out>)
at /archive/foreign/src/lib/libexecinfo/unwind.c:67
#23 0x00007f7ff782b404 in GetStackTrace_generic (result=0x622c10, max_depth=30,
skip_count=3) at src/stacktrace_generic-inl.h:68
#24 0x00007f7ff782b6c6 in GetStackTrace (result=<optimized out>,
max_depth=<optimized out>, skip_count=<optimized out>) at src/stacktrace.cc:228
#25 0x00007f7ff78216b6 in RecordGrowth (growth=1048576) at src/page_heap.cc:498
#26 tcmalloc::PageHeap::GrowHeap (this=0x662c00, n=<optimized out>) at
src/page_heap.cc:524
#27 0x00007f7ff78219ca in tcmalloc::PageHeap::New (this=0x662c00, n=2) at
src/page_heap.cc:155
#28 0x00007f7ff7820585 in tcmalloc::CentralFreeList::Populate
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>) at
src/central_freelist.cc:329
#29 0x00007f7ff782078f in tcmalloc::CentralFreeList::FetchFromOneSpansSafe
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>, N=1,
start=0x7f7fffffc858, end=0x7f7fffffc850)
at src/central_freelist.cc:284
#30 0x00007f7ff782081d in tcmalloc::CentralFreeList::RemoveRange
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>,
start=0x7f7fffffc858, end=0x7f7fffffc850, N=1) at src/central_freelist.cc:264
#31 0x00007f7ff78230eb in tcmalloc::ThreadCache::FetchFromCentralCache
(this=0x6edc98, cl=6, byte_size=80) at src/thread_cache.cc:165
#32 0x00007f7ff78369c1 in do_malloc_no_errno (size=80) at src/tcmalloc.cc:1102
#33 do_malloc (size=80) at src/tcmalloc.cc:1109
#34 Allocate (type=-271733872, size=32) at src/debugallocation.cc:529
#35 DebugAllocate (type=-271733872, size=32) at src/debugallocation.cc:1015
#36 do_debug_malloc_or_debug_cpp_alloc (size=32) at src/debugallocation.cc:1215
#37 tc_malloc (size=32) at src/debugallocation.cc:1221
#38 0x00007f7ff6107052 in atexit_handler_alloc (dso=0x7f7ff6ef6780
<__dso_handle>) at /archive/foreign/src/lib/libc/stdlib/atexit.c:112
#39 __cxa_atexit (func=0x7f7ff6cc503a
<__eh_globals_init::~__eh_globals_init()>, arg=0x7f7ff6f0bef0,
dso=0x7f7ff6ef6780 <__dso_handle>) at
/archive/foreign/src/lib/libc/stdlib/atexit.c:145
#40 0x00007f7ff6c65904 in ?? () from /usr/lib/libstdc++.so.7
#41 0x00007f7ff7ff97e0 in ?? ()
#42 0x00007f7ff6c62c99 in _init () from /usr/lib/libstdc++.so.7
#43 0x0000000000000000 in ?? ()
If I continue and press CTRL-C a short while later, the backtrace visually
doesn't look different.
Original comment by tk@giga.or.at
on 25 Feb 2014 at 9:32
Thanks. That's close to what I expected. It's in deadlock state. Trying to grab
lock that's already taken.
Which is slightly weird. AFAIK since long time ago libunwind is not using
malloc directly but doing mmap and it's own memory management. At least that's
what causes things to work on GNU/Linux.
What is your libunwind version BTW ?
Original comment by alkondratenko
on 25 Feb 2014 at 9:35
Correction: on GNU/Linux and x86. GNU/Linux on arm by default does fopen out of
libunwind which internally does malloc and causes very similar deadlock.
And I believe that's fixable situation.
Original comment by alkondratenko
on 25 Feb 2014 at 9:37
I'm not using libunwind, I'm using backtrace() from NetBSD's libexecinfo:
# ldd .libs/debugallocation_test
.libs/debugallocation_test:
-ltcmalloc_debug.5 => not found
-lexecinfo.0 => /usr/lib/libexecinfo.so.0
-lelf.0 => /usr/lib/libelf.so.0
-lc.12 => /usr/lib/libc.so.12
-lpthread.1 => /usr/lib/libpthread.so.1
-lstdc++.7 => /usr/lib/libstdc++.so.7
-lgcc_s.1 => /usr/lib/libgcc_s.so.1
-lm.0 => /usr/lib/libm.so.0
Does it need its own workaround, or are changes needed in it?
Original comment by tk@giga.or.at
on 25 Feb 2014 at 9:40
Ok. Then it's same is gnu libc's backtrace. Code mentions that it'll call
malloc causing troubles.
I'm thinking about detecting malloc-called-from-malloc situation using
thread-local-variable. And when it's detected I plan to use separate simplistic
"emergency" allocator. I believe it'll work. And it should fix backtrace() on
GNU/Linux as well as hopefully on bsds.
Right now that's just my plans. I cannot promise you any dates. Feel free to
take this work yourself.
Alternatively, if all you need is just malloc, then simply build with
--enable-minimal and it will not have any code for stacktrace capturing.
Original comment by alkondratenko
on 25 Feb 2014 at 9:43
This sounds like it should work.
I was mostly interested in using it for profiling, so I guess I should be fine
even with the current version.
Thanks for working on gperftools!
Original comment by tk@giga.or.at
on 25 Feb 2014 at 9:50
Original issue reported on code.google.com by
tk@giga.or.at
on 25 Feb 2014 at 8:51