Closed GoogleCodeExporter closed 9 years ago
Difficult to guess what might be going on. You could try running with the cpu
profiler as well (see the README) and try to figure out where the program is
spending its time.
How did you configure perftools to get stacktraces on x86-64?
Original comment by csilv...@gmail.com
on 23 Apr 2011 at 7:24
We first make and install libunwind-0.99-beta, and it provides libunwind.so.XX
and libunwind-x86_64.so.XX, And then we configure and compile heapprofile
normally (no extra param on configure), After tcmalloc( with heapprofile ) make
success, then make my application with -ltcmalloc. Application will links
libtcmalloc.so.
And we start our application ( with env: export HEAPPROFILE=/tmp/profile
export HEAP_PROFILE_INUSE_INTERVAL=524288000 ). And it runs very slowly, about
1/10 the speed of it runs without export HEAPPROFILE( case of heapprofile not
running ). Althougth it runs slowly, but both our program and heapprofile runs
ok.
Then we ldd my application and libtcmalloc.so, we find that it links the
libunwind.so.7, And we think that it may be the problem of using libunwind.so.7
not libunwind-x86_64.so.7.
So After that, we edit the Makefile for tcmalloc, change -lunwind to
-lunwind-x86_64. and do the same things above. But nothing turns better. We ldd
the libtcmalloc.so and it output:
$ ldd libtcmalloc.so
linux-vdso.so.1 => (0x00007fff3bfff000)
libunwind-x86_64.so.7 => /lib/libunwind-x86_64.so.7 (0x00007f3640fe5000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f3640dc9000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f3640abc000)
libm.so.6 => /lib/libm.so.6 (0x00007f3640839000)
libc.so.6 => /lib/libc.so.6 (0x00007f36404e6000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3641481000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f36402ce000)
libunwind.so.7 => /home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7 (0x00007f36400b0000)
we think it have used x86_64 libunwind already.
Is we configure tcmalloc( heapprofile ) for x86_64 correct?
Do we ignore some important steps?
If correct, we will use cpu-profile to find out why it so slowly after start
heapprofile.
Thank you very much!
Original comment by freej...@gmail.com
on 25 Apr 2011 at 3:05
Below is we runs the application with cpu-profile:
Using heapprofile:
Total: 16825 samples
9130 54.3% 54.3% 9130 54.3% sem_wait
5879 34.9% 89.2% 5879 34.9% sigprocmask
373 2.2% 91.4% 373 2.2% apply_reg_state
85 0.5% 91.9% 97 0.6% luaS_newlstr
74 0.4% 92.4% 327 1.9% llex
56 0.3% 92.7% 56 0.3% open
52 0.3% 93.0% 54 0.3% save
38 0.2% 93.2% 40 0.2% __strtof_l
38 0.2% 93.5% 38 0.2% memcpy
36 0.2% 93.7% 42 0.2% luaH_getstr (inline)
33 0.2% 93.9% 35 0.2% traversetable (inline)
32 0.2% 94.1% 1049 6.2% luaV_execute
29 0.2% 94.2% 29 0.2% _ULx86_64_step
24 0.1% 94.4% 45 0.3% luaK_code
23 0.1% 94.5% 23 0.1% dwarf_get
23 0.1% 94.7% 23 0.1% read
22 0.1% 94.8% 22 0.1% uc_addr
21 0.1% 94.9% 88 0.5% read_string (inline)
20 0.1% 95.0% 20 0.1% common_init
19 0.1% 95.1% 19 0.1% _ULx86_64_dwarf_find_save_locs
19 0.1% 95.3% 43 0.3% _init
19 0.1% 95.4% 19 0.1% access_mem
17 0.1% 95.5% 17 0.1% _ULx86_64_dwarf_step
17 0.1% 95.6% 17 0.1% _Ux86_64_getcontext
17 0.1% 95.7% 98 0.6% newkey
16 0.1% 95.8% 17 0.1% GetStackTrace
Without heapprofile
Total: 1015 samples
84 8.3% 8.3% 93 9.2% luaS_newlstr
65 6.4% 14.7% 65 6.4% save
61 6.0% 20.7% 311 30.6% llex
44 4.3% 25.0% 51 5.0% __strtof_l
40 3.9% 29.0% 40 3.9% open
36 3.5% 32.5% 38 3.7% traversetable (inline)
32 3.2% 35.7% 863 85.0% luaV_execute
31 3.1% 38.7% 32 3.2% luaH_getstr (inline)
29 2.9% 41.6% 29 2.9% memcpy
It seems that the lock while heapprofile use to hook the malloc and free
consume lot's of the CPU.
We don't understand why 32bit system we used before will not appear this
problem, and the program runs just a little bit slow.
Please give some suggestion for us, if you know how to handle it. Thanks alot!
Original comment by freej...@gmail.com
on 25 Apr 2011 at 3:44
Hmm, why is it calling sigprocmask so much? That seems very suspicious to me.
I guess it must be from ListAllThreads in src/base/linuxthreads.h. Maybe
that's supposed to run a lot? I'm not that familiar with this part of the
codebase. But wait: I thought that was only called from the heap-checker.
Maybe you can run this in gdb and see if you can figure out where those
thousands of calls to sigprocmask and sem_wait are coming from.
Original comment by csilv...@gmail.com
on 25 Apr 2011 at 5:11
Blow is the call stack for sigprocmask:
#0 0x00007fb8a3c3c180 in sigprocmask () from /lib/libc.so.6
#1 0x00007fb8a2f85d0f in _ULx86_64_init () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#2 0x00007fb8a2f862fa in _ULx86_64_init_local () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#3 0x00007fb8a4944c08 in GetStackTrace (result=0x14e0010, max_depth=30,
skip_count=3) at src/stacktrace_libunwind-inl.h:91
#4 0x00007fb8a493a0e8 in tcmalloc::PageHeap::GrowHeap (this=0x7fb8a4b6fa40,
n=<value optimized out>) at src/page_heap.cc:470
#5 0x00007fb8a493a387 in tcmalloc::PageHeap::New (this=0x7fb8a4b6fa40, n=1) at
src/page_heap.cc:96
#6 0x00007fb8a49383cf in tcmalloc::CentralFreeList::Populate
(this=0x7fb8a4b5f900) at src/central_freelist.cc:282
#7 0x00007fb8a493853d in tcmalloc::CentralFreeList::FetchFromSpansSafe
(this=0x7fb8a4b5f900) at src/central_freelist.cc:249
#8 0x00007fb8a49385cb in tcmalloc::CentralFreeList::RemoveRange
(this=0x7fb8a4b5f900, start=0x7fff91efeec8, end=0x7fff91efeec0,
N=1) at src/central_freelist.cc:227
and the sem_wait is all called for our code.
but i don't know why sem_wait has so much samples, it has a littel samples
Without heapprofile.
Original comment by freej...@gmail.com
on 25 Apr 2011 at 9:18
complete call stack for sigprocmask:
0 0x00007fb1e2f65180 in sigprocmask () from /lib/libc.so.6
#1 0x00007fb1e22ae718 in _ULx86_64_dwarf_find_proc_info () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#2 0x00007fb1e22ab1d7 in fetch_proc_info () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#3 0x00007fb1e22abd2a in _ULx86_64_dwarf_find_save_locs () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#4 0x00007fb1e22acb11 in _ULx86_64_dwarf_step () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#5 0x00007fb1e22afe0d in _ULx86_64_step () from
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#6 0x00007fb1e3c6dc1f in GetStackTrace (result=0x1102010, max_depth=30,
skip_count=3) at src/stacktrace_libunwind-inl.h:96
#7 0x00007fb1e3c630e8 in tcmalloc::PageHeap::GrowHeap (this=0x7fb1e3e98a40,
n=<value optimized out>) at src/page_heap.cc:470
#8 0x00007fb1e3c63387 in tcmalloc::PageHeap::New (this=0x7fb1e3e98a40, n=1) at
src/page_heap.cc:96
#9 0x00007fb1e3c613cf in tcmalloc::CentralFreeList::Populate
(this=0x7fb1e3e88900) at src/central_freelist.cc:282
#10 0x00007fb1e3c6153d in tcmalloc::CentralFreeList::FetchFromSpansSafe
(this=0x7fb1e3e88900) at src/central_freelist.cc:249
#11 0x00007fb1e3c615cb in tcmalloc::CentralFreeList::RemoveRange
(this=0x7fb1e3e88900, start=0x7fffa8232468, end=0x7fffa8232460,
N=1) at src/central_freelist.cc:227
#12 0x00007fb1e3c64fb5 in tcmalloc::ThreadCache::FetchFromCentralCache
(this=0x114a000, cl=4, byte_size=48)
at src/thread_cache.cc:150
#13 0x00007fb1e3c7386b in tc_new (size=48) at src/thread_cache.h:330
#14 0x00007fb1e3a321b8 in ProfileHandler::Init () at src/profile-handler.cc:222
#15 0x00007fb1e2b19ea3 in pthread_once () from /lib/libpthread.so.0
#16 0x00007fb1e3a3220e in ProfileHandler::Instance () at
src/profile-handler.cc:227
#17 0x00007fb1e3a324d9 in ?? () from ./libprofiler.so.0
#18 0x00007fb1e3a366d6 in ?? () from ./libprofiler.so.0
Original comment by freej...@gmail.com
on 25 Apr 2011 at 9:27
I thought it maybe espend so much cpu time in calling GetStackTrace at
src/stacktrace_libunwind-inl.h:96. but, i don't kown why it runs ok in the
32bit system.
Original comment by freej...@gmail.com
on 25 Apr 2011 at 9:51
Looks like the issue is with libunwind. I'm sure it does something totally
different on i386 and x86_64, which would explain the time difference.
I think libunwind 0.99 is really old; check their website for a new version
which might be better.
Also, consider running with frame pointers on x86_64 rather than using
libunwind (see the README for details).
I'm closing this not-a-bug since it doesn't seem to be anything on the
perftools side.
Original comment by csilv...@gmail.com
on 25 Apr 2011 at 9:25
alright,thank you very much.
Original comment by freej...@gmail.com
on 27 Apr 2011 at 2:22
Original issue reported on code.google.com by
freej...@gmail.com
on 23 Apr 2011 at 5:00