Closed GoogleCodeExporter closed 9 years ago
Interesting find. I wonder if making the lock recursive would be safe?
Original comment by chapp...@gmail.com
on 3 May 2012 at 3:02
I see you mention that you are using glibc-2.14.90 but can you please double
check? In speaking with the gnu libc guys, version 2.15 introduced a bug in
_IO_vfscanf_internal where alloca is avoided altogether and realloc is called
instead. The intent was for realloc to only be called for large buffers. In
2.16 this is fixed such that alloca is still used for small buffers.
Alternatively, perhaps there is an entry in your 'maps' file (/proc/<pid>/maps)
that is longer than what is normally encountered thereby forcing
_IO_vfscanf_internal to fall back on realloc.
Original comment by chapp...@gmail.com
on 15 May 2012 at 5:31
Unless Fedora backported that bug, it's 2.14. I do have fairly long maps,
though -- the filenames are 100 characters long or so. They might have
exceeded 128 bytes when I hit the deadlock.
Original comment by aml...@gmail.com
on 1 Jun 2012 at 9:12
The longest line in /proc/self/maps is 177 bytes, including the trailing
newline.
Original comment by aml...@gmail.com
on 1 Jun 2012 at 9:16
I'm hitting this bug on Ubuntu Precise with glibc 2.15-0ubuntu10.
The deadlocked stack trace looks like:
Thread 4 (Thread 0x7fd1d005f700 (LWP 1018)):
#0 0x00007fd212762277 in sys_futex (t=0x7fd1d005c840, v=2, o=128,
a=0x7fd2129994c0) at ./src/base/linux_syscall_support.h:1789
#1 base::internal::SpinLockDelay (w=0x7fd2129994c0, value=2, loop=<optimized
out>) at ./src/base/spinlock_linux-inl.h:87
#2 0x00007fd2127611a2 in SpinLock::SlowLock (this=0x7fd2129994c0) at
src/base/spinlock.cc:132
#3 0x00007fd21275cf4a in Lock (this=0x7fd2129994c0) at src/base/spinlock.h:75
#4 SpinLockHolder (l=0x7fd2129994c0, this=<synthetic pointer>) at
src/base/spinlock.h:141
#5 RecordAlloc (ptr=0x18e1300, bytes=256, skip_count=0) at
src/heap-profiler.cc:304
#6 0x00007fd21275817a in MallocHook::InvokeNewHookSlow (p=0x18e1300, s=256) at
src/malloc_hook.cc:525
#7 0x00007fd2127656cb in InvokeNewHook (s=256, p=0x18e1300) at
src/malloc_hook-inl.h:161
#8 do_realloc_with_callback (invalid_get_size_fn=<optimized out>,
invalid_free_fn=<optimized out>, new_size=<optimized out>, old_ptr=<optimized
out>) at src/tcmalloc.cc:1221
#9 do_realloc (new_size=<optimized out>, old_ptr=<optimized out>) at
src/tcmalloc.cc:1228
#10 tc_realloc (old_ptr=<optimized out>, new_size=256) at src/tcmalloc.cc:1526
#11 0x00007fd21199c66f in _IO_vfscanf_internal (s=<optimized out>,
format=<optimized out>, argptr=0x7fd1d005cf68, errp=0x0) at vfscanf.c:1413
#12 0x00007fd2119b5360 in _IO_vsscanf (string=0x7fd1d005d1c0 "00400000-0047a000
r-xp 00000000 fc:00 9965050", ' ' <repeats 28 times>,
"/home/nelhage/code/codesearch/codesearch", format=0x7fd21276df30 "%lx-%lx %4s
%lx %x:%x %ld %n", args=0x7fd1d005cf68) at iovsscanf.c:45
#13 0x00007fd2119a0317 in __sscanf (s=<optimized out>, format=<optimized out>)
at sscanf.c:34
#14 0x00007fd2127609ab in ProcMapsIterator::NextExt (this=0x7fd1d005e5c0,
start=0x7fd1d005d190, end=0x7fd1d005d198, flags=0x7fd1d005d1b0,
offset=0x7fd1d005d1a0, inode=0x7fd1d005d1a8, filename=0x7fd1d005d1b8,
file_mapping=0x0, file_pages=0x0, anon_mapping=0x0, anon_pages=0x0, dev=0x0) at
src/base/sysinfo.cc:721
#15 0x00007fd212760cbf in ProcMapsIterator::Next (this=<optimized out>,
start=<optimized out>, end=<optimized out>, flags=<optimized out>,
offset=<optimized out>, inode=<optimized out>, filename=0x7fd1d005d1b8) at
src/base/sysinfo.cc:661
#16 0x00007fd212760eea in tcmalloc::FillProcSelfMaps (buf=0x7fd212c1e033 "",
size=1048556, wrote_all=0x7fd1d005e6cf) at src/base/sysinfo.cc:952
#17 0x00007fd21275b23d in HeapProfileTable::FillOrderedProfile
(this=0x7fd212d1e040, buf=0x7fd212c1e020 "\nMAPPED_LIBRARIES:\n", size=1048575)
at src/heap-profile-table.cc:425
#18 0x00007fd21275ca32 in DoGetHeapProfileLocked (buf=0x7fd212c1e020
"\nMAPPED_LIBRARIES:\n", buflen=1048576) at src/heap-profiler.cc:191
#19 0x00007fd21275cc20 in DumpProfileLocked (reason=<optimized out>) at
src/heap-profiler.cc:248
#20 0x00007fd21275cdbc in MaybeDumpProfileLocked () at src/heap-profiler.cc:289
#21 0x00007fd21275cfc6 in MaybeDumpProfileLocked () at src/heap-profiler.cc:262
#22 RecordAlloc (ptr=0x33de000, bytes=67108488, skip_count=0) at
src/heap-profiler.cc:307
#23 0x00007fd21275817a in MallocHook::InvokeNewHookSlow (p=0x33de000,
s=67108488) at src/malloc_hook.cc:525
#24 0x00007fd212767570 in InvokeNewHook (s=67108488, p=<optimized out>) at
src/malloc_hook-inl.h:161
#25 tc_newarray (size=67108488) at src/tcmalloc.cc:1566
[application code]
The longest line in /proc/pid/maps is 167 characters, with trailing newline.
Original comment by nelh...@nelhage.com
on 2 Jun 2012 at 4:32
For what it's worth, the attached patch is a moderately sketchy workaround that
let me successfully use gperftools against my glibc 2.15. I am not proposing it
for merge, just as a temporary solution for anyone else struggling to make this
work. I *believe* it to be safe, at least as long as the application does not
call GetHeapProfile, but I am not completely sure. It works in my relatively
minimal testing.
Original comment by nelh...@nelhage.com
on 2 Jun 2012 at 5:09
Attachments:
Why does glibc allocate memory in sscanf, anyway? On quick inspection, the
Bionic, FreeBSD, and Apple versions don't. The glibc code is, IMO, some of the
least comprehensible code I've ever seen. I have no idea what's going on.
In any case, Apple's implementation is self-contained. The attached ugly patch
adds Apple's sscanf and uses it.
I haven't checked the licenses for compatibility, and this patch probably does
the wrong thing on non-Linux.
Original comment by aml...@gmail.com
on 26 Jun 2012 at 11:42
Attachments:
There's a bug in that patch: apple_scanf.c needs #include <stdlib.h> to avoid
incorrect code generation. Sometimes I hate C.
Better version attached.
Original comment by aml...@gmail.com
on 27 Jun 2012 at 12:36
Attachments:
This bug is also hitting on Fedora 17, which uses glibc-2.15. Basically this
bug makes tcmalloc's heap-check or heap-profiler useless on Fedora 17 if one
doesn't compile a glibc-2.16 or patch gperftools.
IMHO, I don't think you (gperftools' developers) should rely on glibc's
implementation detail of sscanf not calling to realloc. It will break whenever
the file name is long, as mentioned in the OP's comment, or whenever glibc guys
change their mind.
The approaches that come out of my mind (you guys may have better solutions):
1) Make the lock reentrant, which is a lot easier but introduces performance
penalty;
2) Removes all calls to glibc that will potentially need dynamic memory, and
provides your own alternatives, which introduces a lot of engineering work.
If I were a gperftools developer, I would first try approach 1 for a quick
hack, then approach 2 for long term fix, gperftools should be quite stand alone
and self contained because it is such a low level library.
Original comment by likan...@gmail.com
on 2 Jul 2012 at 11:07
[deleted comment]
i got the same bug as "Comment 5 by nelh...@nelhage.com, Jun 1, 2012" metioned
on Ubuntu Precise. After applying the patch, heap profiler can work but it
seems heap checker also have a deadlock issue. Here is the trace:
#0 0x00007fa4a2360510 in __nanosleep_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1 0x00007fa4a2ab87c9 in base::internal::SpinLockDelay (w=0x7fa4a2cce2a0,
value=2, loop=11037) at ./src/base/spinlock_linux-inl.h:89
#2 0x00007fa4a2ab8a52 in SpinLock::SlowLock (this=0x7fa4a2cce2a0) at
src/base/spinlock.cc:132
#3 0x00007fa4a2a9f94c in Lock (this=0x7fa4a2cce2a0) at src/base/spinlock.h:75
#4 SpinLockHolder (l=0x7fa4a2cce2a0, this=0x7fffbd2a23d0) at
src/base/spinlock.h:141
#5 NewHook (ptr=0x1f48000, size=256) at src/heap-checker.cc:577
#6 0x00007fa4a2aae17a in MallocHook::InvokeNewHookSlow (p=0x1f48000, s=256) at
src/malloc_hook.cc:525
#7 0x00007fa4a2abb6cb in InvokeNewHook (s=256, p=0x1f48000) at
src/malloc_hook-inl.h:161
#8 do_realloc_with_callback (invalid_get_size_fn=<optimized out>,
invalid_free_fn=<optimized out>, new_size=<optimized out>,
old_ptr=<optimized out>) at src/tcmalloc.cc:1221
#9 do_realloc (new_size=<optimized out>, old_ptr=<optimized out>) at
src/tcmalloc.cc:1228
#10 tc_realloc (old_ptr=<optimized out>, new_size=256) at src/tcmalloc.cc:1526
#11 0x00007fa4a1fec66f in _IO_vfscanf_internal (s=<optimized out>,
format=<optimized out>, argptr=0x7fffbd2a2998, errp=0x0)
at vfscanf.c:1413
#12 0x00007fa4a2005360 in _IO_vsscanf (
string=0x7fffbd2a2c70 "00400000-00408000 r-xp 00000000 fc:00 24166", ' ' <repeats 30 times>, "/path/to/program", format=0x7fa4a2ac3f30 "%lx-%lx %4s %lx %x:%x %ld %n", args=0x7fffbd2a2998) at iovsscanf.c:45
#13 0x00007fa4a1ff0317 in __sscanf (s=<optimized out>, format=<optimized out>)
at sscanf.c:34
#14 0x00007fa4a2ab69ab in ProcMapsIterator::NextExt (this=0x7fffbd2a4070,
start=0x7fffbd2a2c38, end=0x7fffbd2a2c40, flags=0x7fffbd2a2c58,
offset=0x7fffbd2a2c48, inode=0x7fffbd2a2c50, filename=0x7fffbd2a2c60, file_mapping=0x0, file_pages=0x0, anon_mapping=0x0,
anon_pages=0x0, dev=0x0) at src/base/sysinfo.cc:721
#15 0x00007fa4a2ab6cbf in ProcMapsIterator::Next (this=<optimized out>,
start=<optimized out>, end=<optimized out>, flags=<optimized out>,
offset=<optimized out>, inode=<optimized out>, filename=0x7fffbd2a2c60) at src/base/sysinfo.cc:661
#16 0x00007fa4a2aa22a9 in HeapLeakChecker::UseProcMapsLocked
(proc_maps_task=HeapLeakChecker::DISABLE_LIBRARY_ALLOCS)
at src/heap-checker.cc:909
#17 0x00007fa4a2aa29f0 in HeapLeakChecker_InternalInitStart () at
src/heap-checker.cc:2032
#18 0x00007fa4a2a9a1ec in google_init_module_init_start () at
src/heap-checker.cc:2122
#19 GoogleInitializer (dtor=0, ctor=<optimized out>, name=0x7fa4a2abfcba
"init_start", this=0x7fa4a2cce2e0) at src/base/googleinit.h:45
#20 __static_initialization_and_destruction_0 (__initialize_p=<optimized out>,
__priority=<optimized out>) at src/heap-checker.cc:2122
#21 _GLOBAL__sub_I_heap_checker.cc(void) () at src/heap-checker.cc:2372
#22 0x00007fa4a34a6306 in call_init (l=<optimized out>, argc=5,
argv=0x7fffbd2a4338, env=0x7fffbd2a4368) at dl-init.c:85
#23 0x00007fa4a34a63df in call_init (env=<optimized out>, argv=<optimized out>,
argc=<optimized out>, l=<optimized out>) at dl-init.c:52
#24 _dl_init (main_map=0x7fa4a36bb2c8, argc=5, argv=0x7fffbd2a4338,
env=0x7fffbd2a4368) at dl-init.c:134
#25 0x00007fa4a34986ea in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
Hope this helps.
Original comment by xgw.s...@gmail.com
on 16 Jul 2012 at 7:37
It would be great to get these deadlock issues resolved; this is creating a bit
of pain in the current Ubuntu development release where we would like to gate
this package on successful execution of its test suite.
Original comment by ringo.p...@gmail.com
on 23 Jul 2012 at 2:17
Moving priority up on this one. I am going to try and at least get something
temporary in place to get people unblocked on this ugly issue. I'll keep this
issue updated on my progress.
Original comment by chapp...@gmail.com
on 28 Oct 2012 at 8:02
Original comment by chapp...@gmail.com
on 22 Dec 2012 at 5:54
Original issue reported on code.google.com by
aml...@gmail.com
on 6 Apr 2012 at 9:27