Closed GoogleCodeExporter closed 9 years ago
I suppose it helps if I tell you what version of glibc I have :)
[root@myhost]# yum list | grep libc
glibc.i686 2.5-65 installed
glibc.x86_64 2.5-65 installed
glibc-common.x86_64 2.5-65 installed
glibc-devel.x86_64 2.5-65 installed
glibc-headers.x86_64 2.5-65 installed
libcap.i386 1.10-26 installed
libcap.x86_64 1.10-26 installed
compat-glibc.i386 1:2.3.4-2.26 rhel-disc
compat-glibc.x86_64 1:2.3.4-2.26 rhel-disc
compat-glibc-headers.x86_64 1:2.3.4-2.26 rhel-disc
compat-libcom_err.i386 1.0-7 rhel-disc
compat-libcom_err.x86_64 1.0-7 rhel-disc
glibc-devel.i386 2.5-65 rhel-x86_64-server-5
glibc-utils.x86_64 2.5-65 rhel-x86_64-server-5
libc-client.i386 2004g-2.2.1 rhel-disc
libc-client.x86_64 2004g-2.2.1 rhel-disc
libc-client-devel.i386 2004g-2.2.1 rhel-disc
libc-client-devel.x86_64 2004g-2.2.1 rhel-disc
libcap-devel.i386 1.10-26 rhel-disc
libcap-devel.x86_64 1.10-26 rhel-disc
libchewing.i386 0.3.0-8.el5 rhel-disc
libchewing.x86_64 0.3.0-8.el5 rhel-disc
libchewing-devel.i386 0.3.0-8.el5 rhel-disc
libchewing-devel.x86_64 0.3.0-8.el5 rhel-disc
libcmpiutil.i386 0.5.1-1.el5 rhel-x86_64-server-vt-5
libcmpiutil.x86_64 0.5.1-1.el5 rhel-x86_64-server-vt-5
libcmpiutil-devel.i386 0.5.1-1.el5 rhel-x86_64-server-vt-5
libcmpiutil-devel.x86_64 0.5.1-1.el5 rhel-x86_64-server-vt-5
libcollection.i386 0.6.0-10.el5 rhel-x86_64-server-5
libcollection.x86_64 0.6.0-10.el5 rhel-x86_64-server-5
libcollection-devel.i386 0.6.0-10.el5 rhel-x86_64-server-5
libcollection-devel.x86_64 0.6.0-10.el5 rhel-x86_64-server-5
libcroco.i386 0.6.1-2.1 rhel-disc
libcroco.x86_64 0.6.1-2.1 rhel-disc
libcroco-devel.i386 0.6.1-2.1 rhel-disc
libcroco-devel.x86_64 0.6.1-2.1 rhel-disc
libcxgb3.i386 1.2.5-2.el5 rhel-disc
libcxgb3.x86_64 1.2.5-2.el5 rhel-disc
libcxgb3-devel.i386 1.0-6.el5_1.1 rhel-x86_64-server-5
libcxgb3-devel.x86_64 1.0-6.el5_1.1 rhel-x86_64-server-5
libcxgb3-static.x86_64 1.2.5-2.el5 rhel-disc
libcxgb4.i386 1.1.1-2.el5 rhel-x86_64-server-5
libcxgb4.x86_64 1.1.1-2.el5 rhel-x86_64-server-5
libcxgb4-static.x86_64 1.1.1-2.el5 rhel-x86_64-server-5
[root@myhost]#
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 2:43
They're all blocked in sys_futex because they're all waiting on the same lock
-- so that part is expected. If you allocate/deallocate a lot, it wouldn't be
100% surprising to occassionally find all-but-one thread waiting in sys_futex,
while the last thread actually makes some progress. But here, I'm guessing all
the threads are stuck in sys_futex.
The problem I see, in the thread backtraces you attached (I take it that's only
a subset?), is this one:
---
#10 0x000000000101ebb7 in allocate (signo=17, info=<value optimized out>,
notused=<value optimized out>)
at /vobs/fw-linux-tools/host/x86_64-pc-linux-gnu/bin/../lib/gcc/x86_64-pc-linux-gnu/4.5.1/../../../../x86_64-pc-linux-gnu/include/c++/4.5.1/ext/new_allocator.h:89
#11 _M_get_node (signo=17, info=<value optimized out>, notused=<value optimized
out>)
at /vobs/fw-linux-tools/host/x86_64-pc-linux-gnu/bin/../lib/gcc/x86_64-pc-linux-gnu/4.5.1/../../../../x86_64-pc-linux-gnu/include/c++/4.5.1/bits/stl_list.h:316
#12 _M_create_node<unsigned long> (signo=17, info=<value optimized out>,
notused=<value optimized out>)
at /vobs/fw-linux-tools/host/x86_64-pc-linux-gnu/bin/../lib/gcc/x86_64-pc-linux-gnu/4.5.1/../../../../x86_64-pc-linux-gnu/include/c++/4.5.1/bits/stl_list.h:478
#13 _M_insert<unsigned long> (signo=17, info=<value optimized out>,
notused=<value optimized out>)
at /vobs/fw-linux-tools/host/x86_64-pc-linux-gnu/bin/../lib/gcc/x86_64-pc-linux-gnu/4.5.1/../../../../x86_64-pc-linux-gnu/include/c++/4.5.1/bits/stl_list.h:1442
#14 push_back (signo=17, info=<value optimized out>, notused=<value optimized
out>)
at /vobs/fw-linux-tools/host/x86_64-pc-linux-gnu/bin/../lib/gcc/x86_64-pc-linux-gnu/4.5.1/../../../../x86_64-pc-linux-gnu/include/c++/4.5.1/bits/stl_list.h:926
#15 sig_handler_kevent (signo=17, info=<value optimized out>, notused=<value
optimized out>) at osiEventEpoll.cpp:1403
#16 <signal handler called>
#17 0x00002b5e020236d0 in sigprocmask () from /lib64/libc.so.6
#18 0x00002b5e02350f24 in put_rs_cache (c=0x5a378940) at dwarf/Gparser.c:525
#19 _ULx86_64_dwarf_find_save_locs (c=0x5a378940) at dwarf/Gparser.c:852
#20 0x00002b5e02351119 in _ULx86_64_dwarf_step (c=0x2) at dwarf/Gstep.c:35
#21 0x00002b5e0235311a in _ULx86_64_step (cursor=0x2) at x86_64/Gstep.c:42
---
This is the libunwind code, doing allocations in a signal handler. That is not
allowed in C (or C++), and would explain the deadlock.
I am almost positive this is an instance of issue 66, but won't mark it as so
until we confirm it.
There are two possible solutions:
1) Try running with a version of preftools configured with
--enable-frame-pointers rather than libunwind (at least, just to confirm that
libunwind is the problem).
2) Try using the latest version of libunwind (1.0.1?) -- it has a different set
of problems than libunwind 0.99, that may not affect you. The latest svn-trunk
INSTALL file talks about this a bit more:
http://google-perftools.googlecode.com/svn/trunk/INSTALL
Original comment by csilv...@gmail.com
on 10 Nov 2011 at 5:45
You are correct that this is a subset. There are 59 threads and I was lazy. I
can provide them all if you feel it will be useful.
In the meantime, I will try your two suggestions and get back to you within a
few hours.
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 6:03
I should add that I looked at issue 66 as well. I tried most of the stuff
suggested in there, but will give the suggestions another try and will itemize
them here as I go.
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 6:06
Steps taken:
make uninstall libunwind
make uninstall google-perftools
configure google-perftools with ./configure --enable-frame-pointers
make install google-perftools
start myapp
The below is 'sanitized' output to keep my corporate overlords happy. If you
need any clarification on paths or names please ask. It is possible I may have
made an error during sanitizing :(
I see that we are apparently dumping to the heap files (and apparently
allocating and freeing LOTS of memory?). I also see a few complaints about 'no
version information available'.
Nov 10 13:22:18 myhost myapp_out: /usr/local/bin/myapp:
/usr/local/lib64/libstdc++.so.6: no version information available (required by
/usr/local/lib/libtcmalloc.so)
Nov 10 13:22:18 myhost myapp_out: /usr/local/bin/myapp:
/usr/local/lib64/libstdc++.so.6: no version information available (required by
/usr/local/lib/libtcmalloc.so)
Nov 10 13:22:22 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0001.heap (95 MB allocated cumulatively, 75 MB currently
in use)
Nov 10 13:22:22 myhost myapp_out: mkdir: /usr/local/lib64/libstdc++.so.6: no
version information available (required by usr/local/lib/libtcmalloc.so)
Nov 10 13:22:22 myhost myapp_out: mkdir: /usr/local/lib64/libstdc++.so.6: no
version information available (required by usr/local/lib/libtcmalloc.so)
Nov 10 13:22:22 myhost myapp_out: Starting tracking the heap
Nov 10 13:22:22 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp_10523.0001.heap (Exiting)
Nov 10 13:22:24 myhost myapp_out: sh: /usr/local/lib64/libstdc++.so.6: no
version information available (required by /usr/local/lib/libtcmalloc.so)
Nov 10 13:22:24 myhost myapp_out: sh: /usr/local/lib64/libstdc++.so.6: no
version information available (required by /usr/local/lib/libtcmalloc.so)
Nov 10 13:22:24 myhost myapp_out: Starting tracking the heap
Nov 10 13:22:24 myhost myapp_out: cp: /usr/local/lib64/libstdc++.so.6: no
version information available (required by /usr/local/lib/libtcmalloc.so)
Nov 10 13:22:24 myhost myapp_out: cp: /usr/local/lib64/libstdc++.so.6: no
version information available (required by /usr/local/lib/libtcmalloc.so)
Nov 10 13:22:24 myhost myapp_out: Starting tracking the heap
Nov 10 13:22:24 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp_10724.0001.heap (Exiting)
Nov 10 13:22:27 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0002.heap (190 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:22:32 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0003.heap (286 MB allocated cumulatively, 83 MB currently
in use)
Nov 10 13:22:38 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0004.heap (381 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:22:44 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0005.heap (476 MB allocated cumulatively, 83 MB currently
in use)
Nov 10 13:22:50 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0006.heap (572 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:22:56 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0007.heap (667 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:23:02 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0008.heap (762 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:23:08 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0009.heap (858 MB allocated cumulatively, 84 MB currently
in use)
Nov 10 13:23:14 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0010.heap (953 MB allocated cumulatively, 84 MB currently
in use)
...
Nov 10 13:42:44 myhost myapp_out: Dumping heap profile to
/d2/tmp/google/myapp.0211.heap (20122 MB allocated cumulatively, 85 MB
currently in use)
run pprof as follows
[root@myhost ~]$ pprof --dot --base /d2/tmp/google/myapp.0001.heap
/usr/local/bin/myapp /d2/tmp/google/myapp.0011.heap > ~/myapp.dot
Using local file /usr/local/bin/myapp.
Using local file /d2/tmp/google/myapp.heap.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
... <about 200 or so of these>
Dropping nodes with <= 0.0 MB; edges with <= 0.0 abs(MB)
When I look at the resulting dot file in gvedit it claims that there is only
about 9.1MB allocated :(
HOWEVER, it appears that my application is now running properly. I guess that
validates the theory that libunwind is the culprit.
I will now try with the newest version of libunwind after perusing the latest
INSTALL file. I will also do some googling of the Dwarf Error stuff.
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 6:47
My application deadlocks with libunwind 1.0.1
No luck there :(
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 7:10
I'm sorry libunwind is giving you such trouble. Unfortunately, I don't know of
a good solution for that. Did you add in the function calls in main() to
pre-cache some libunwind data, like the newsgroup post suggested? I had hoped
that would be enough, but I'm not totally surprised it isn't.
} BFD: Dwarf Error: mangled line number section.
Weird, I have no idea what is causing this. But for now I'm going with the
theory that this is responsible for the weird dot output...
Original comment by csilv...@gmail.com
on 10 Nov 2011 at 7:31
I suspect the Dwarf Error is the cause of the weird dot output as well.
I am working on building our product now with the changes outlined at
http://groups.google.com/group/google-perftools/msg/2686d9f24ac4365f
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 7:54
I believe I have now gotten our product built with -fno-omit-frame-pointer
Unfortunately I still get the following when running pprof:
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
BFD: Dwarf Error: mangled line number section.
...
Further, the emitted dot file claims a very small number of allocations even
though the heap dumping notifications state more was allocated.
Tomorrow I will try to verify I built everything with -fno-omit-frame-pointer
properly and continue on with some of the other suggestions outlined here.
Original comment by iwor...@gmail.com
on 10 Nov 2011 at 11:14
I have to put this on hold for a few days. I hope to return to it later this
week.
Original comment by iwor...@gmail.com
on 14 Nov 2011 at 7:25
Ping -- I gave you a bit more time than a week. :-) Any more news on this?
Original comment by csilv...@gmail.com
on 25 Jan 2012 at 11:54
hehe. You are correct. I was given more than a week :)
Unfortunately, we haven't made much progress. The project got put on the
backburner a little bit. I expect to pick it up again when we migrate to a
newer version of linux.
For the time being I think we can close this. If we have issues with the new
release we will raise them then.
Original comment by iwor...@gmail.com
on 27 Jan 2012 at 4:09
OK, closing for now. Feel free to reopen if necessary.
Original comment by csilv...@gmail.com
on 31 Jan 2012 at 5:55
Original issue reported on code.google.com by
iwor...@gmail.com
on 10 Nov 2011 at 2:27