davidlee80 / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

why my program run very slower on x86-64 uesd heap profiler? #327

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.first,my program run nomal on x86-32 used HEAPPROFILER
2.lasty,i run my program on x86-64 also used HEAPPROFILER.and it run so slowly.
3.please,can it be resolved,how i can do?

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by freej...@gmail.com on 23 Apr 2011 at 5:00

GoogleCodeExporter commented 9 years ago
Difficult to guess what might be going on.  You could try running with the cpu 
profiler as well (see the README) and try to figure out where the program is 
spending its time.

How did you configure perftools to get stacktraces on x86-64?

Original comment by csilv...@gmail.com on 23 Apr 2011 at 7:24

GoogleCodeExporter commented 9 years ago
We first make and install libunwind-0.99-beta, and it provides libunwind.so.XX 
and libunwind-x86_64.so.XX, And then we configure and compile heapprofile 
normally (no extra param on configure), After tcmalloc( with heapprofile ) make 
success, then make my application with -ltcmalloc. Application will links 
libtcmalloc.so. 

And we start our application ( with env: export HEAPPROFILE=/tmp/profile  
export HEAP_PROFILE_INUSE_INTERVAL=524288000 ). And it runs very slowly, about 
1/10 the speed of it runs without export HEAPPROFILE( case of heapprofile not 
running ). Althougth it runs slowly, but both our program and heapprofile runs 
ok.    

Then we ldd my application and libtcmalloc.so, we find that it links the 
libunwind.so.7, And we think that it may be the problem of using libunwind.so.7 
not libunwind-x86_64.so.7.

So After that, we edit the Makefile for tcmalloc, change -lunwind to 
-lunwind-x86_64. and do the same things above. But nothing turns better. We ldd 
the libtcmalloc.so and it output:
$ ldd libtcmalloc.so
 linux-vdso.so.1 =>  (0x00007fff3bfff000)
 libunwind-x86_64.so.7 => /lib/libunwind-x86_64.so.7 (0x00007f3640fe5000)
 libpthread.so.0 => /lib/libpthread.so.0 (0x00007f3640dc9000)
 libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f3640abc000)
 libm.so.6 => /lib/libm.so.6 (0x00007f3640839000)
 libc.so.6 => /lib/libc.so.6 (0x00007f36404e6000)
 /lib64/ld-linux-x86-64.so.2 (0x00007f3641481000)
 libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f36402ce000)
 libunwind.so.7 => /home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7 (0x00007f36400b0000)
we think it have used x86_64 libunwind already.

Is we configure tcmalloc( heapprofile ) for x86_64 correct?  
Do we ignore some important steps?
If correct, we will use cpu-profile to find out why it so slowly after start 
heapprofile.

Thank you very much!

Original comment by freej...@gmail.com on 25 Apr 2011 at 3:05

GoogleCodeExporter commented 9 years ago
Below is we runs the application with cpu-profile:
Using heapprofile:
Total: 16825 samples
    9130  54.3%  54.3%     9130  54.3% sem_wait
    5879  34.9%  89.2%     5879  34.9% sigprocmask
     373   2.2%  91.4%      373   2.2% apply_reg_state
      85   0.5%  91.9%       97   0.6% luaS_newlstr
      74   0.4%  92.4%      327   1.9% llex
      56   0.3%  92.7%       56   0.3% open
      52   0.3%  93.0%       54   0.3% save
      38   0.2%  93.2%       40   0.2% __strtof_l
      38   0.2%  93.5%       38   0.2% memcpy
      36   0.2%  93.7%       42   0.2% luaH_getstr (inline)
      33   0.2%  93.9%       35   0.2% traversetable (inline)
      32   0.2%  94.1%     1049   6.2% luaV_execute
      29   0.2%  94.2%       29   0.2% _ULx86_64_step
      24   0.1%  94.4%       45   0.3% luaK_code
      23   0.1%  94.5%       23   0.1% dwarf_get
      23   0.1%  94.7%       23   0.1% read
      22   0.1%  94.8%       22   0.1% uc_addr
      21   0.1%  94.9%       88   0.5% read_string (inline)
      20   0.1%  95.0%       20   0.1% common_init
      19   0.1%  95.1%       19   0.1% _ULx86_64_dwarf_find_save_locs
      19   0.1%  95.3%       43   0.3% _init
      19   0.1%  95.4%       19   0.1% access_mem
      17   0.1%  95.5%       17   0.1% _ULx86_64_dwarf_step
      17   0.1%  95.6%       17   0.1% _Ux86_64_getcontext
      17   0.1%  95.7%       98   0.6% newkey
      16   0.1%  95.8%       17   0.1% GetStackTrace

Without heapprofile
Total: 1015 samples
      84   8.3%   8.3%       93   9.2% luaS_newlstr
      65   6.4%  14.7%       65   6.4% save
      61   6.0%  20.7%      311  30.6% llex
      44   4.3%  25.0%       51   5.0% __strtof_l
      40   3.9%  29.0%       40   3.9% open
      36   3.5%  32.5%       38   3.7% traversetable (inline)
      32   3.2%  35.7%      863  85.0% luaV_execute
      31   3.1%  38.7%       32   3.2% luaH_getstr (inline)
      29   2.9%  41.6%       29   2.9% memcpy

It seems that the lock while heapprofile use to hook the malloc and free 
consume lot's of the CPU.

We don't understand why 32bit system we used before will not appear this 
problem, and the program runs just a little bit slow.

Please give some suggestion for us, if you know how to handle it. Thanks alot! 

Original comment by freej...@gmail.com on 25 Apr 2011 at 3:44

GoogleCodeExporter commented 9 years ago
Hmm, why is it calling sigprocmask so much?  That seems very suspicious to me.  
I guess it must be from ListAllThreads in src/base/linuxthreads.h.  Maybe 
that's supposed to run a lot?  I'm not that familiar with this part of the 
codebase.  But wait: I thought that was only called from the heap-checker.  
Maybe you can run this in gdb and see if you can figure out where those 
thousands of calls to sigprocmask and sem_wait are coming from.

Original comment by csilv...@gmail.com on 25 Apr 2011 at 5:11

GoogleCodeExporter commented 9 years ago
Blow is the call stack for sigprocmask:

#0  0x00007fb8a3c3c180 in sigprocmask () from /lib/libc.so.6
#1  0x00007fb8a2f85d0f in _ULx86_64_init () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#2  0x00007fb8a2f862fa in _ULx86_64_init_local () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#3  0x00007fb8a4944c08 in GetStackTrace (result=0x14e0010, max_depth=30, 
skip_count=3) at src/stacktrace_libunwind-inl.h:91
#4  0x00007fb8a493a0e8 in tcmalloc::PageHeap::GrowHeap (this=0x7fb8a4b6fa40, 
n=<value optimized out>) at src/page_heap.cc:470
#5  0x00007fb8a493a387 in tcmalloc::PageHeap::New (this=0x7fb8a4b6fa40, n=1) at 
src/page_heap.cc:96
#6  0x00007fb8a49383cf in tcmalloc::CentralFreeList::Populate 
(this=0x7fb8a4b5f900) at src/central_freelist.cc:282
#7  0x00007fb8a493853d in tcmalloc::CentralFreeList::FetchFromSpansSafe 
(this=0x7fb8a4b5f900) at src/central_freelist.cc:249
#8  0x00007fb8a49385cb in tcmalloc::CentralFreeList::RemoveRange 
(this=0x7fb8a4b5f900, start=0x7fff91efeec8, end=0x7fff91efeec0, 
    N=1) at src/central_freelist.cc:227

and the sem_wait is all called for our code.
but i don't know why sem_wait has so much samples, it has a littel samples 
Without heapprofile.

Original comment by freej...@gmail.com on 25 Apr 2011 at 9:18

GoogleCodeExporter commented 9 years ago
complete call stack for sigprocmask:

0  0x00007fb1e2f65180 in sigprocmask () from /lib/libc.so.6
#1  0x00007fb1e22ae718 in _ULx86_64_dwarf_find_proc_info () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#2  0x00007fb1e22ab1d7 in fetch_proc_info () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#3  0x00007fb1e22abd2a in _ULx86_64_dwarf_find_save_locs () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#4  0x00007fb1e22acb11 in _ULx86_64_dwarf_step () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#5  0x00007fb1e22afe0d in _ULx86_64_step () from 
/home/juva/libunwind-0.99-beta/src/.libs/libunwind.so.7
#6  0x00007fb1e3c6dc1f in GetStackTrace (result=0x1102010, max_depth=30, 
skip_count=3) at src/stacktrace_libunwind-inl.h:96
#7  0x00007fb1e3c630e8 in tcmalloc::PageHeap::GrowHeap (this=0x7fb1e3e98a40, 
n=<value optimized out>) at src/page_heap.cc:470
#8  0x00007fb1e3c63387 in tcmalloc::PageHeap::New (this=0x7fb1e3e98a40, n=1) at 
src/page_heap.cc:96
#9  0x00007fb1e3c613cf in tcmalloc::CentralFreeList::Populate 
(this=0x7fb1e3e88900) at src/central_freelist.cc:282
#10 0x00007fb1e3c6153d in tcmalloc::CentralFreeList::FetchFromSpansSafe 
(this=0x7fb1e3e88900) at src/central_freelist.cc:249
#11 0x00007fb1e3c615cb in tcmalloc::CentralFreeList::RemoveRange 
(this=0x7fb1e3e88900, start=0x7fffa8232468, end=0x7fffa8232460, 
    N=1) at src/central_freelist.cc:227
#12 0x00007fb1e3c64fb5 in tcmalloc::ThreadCache::FetchFromCentralCache 
(this=0x114a000, cl=4, byte_size=48)
    at src/thread_cache.cc:150
#13 0x00007fb1e3c7386b in tc_new (size=48) at src/thread_cache.h:330
#14 0x00007fb1e3a321b8 in ProfileHandler::Init () at src/profile-handler.cc:222
#15 0x00007fb1e2b19ea3 in pthread_once () from /lib/libpthread.so.0
#16 0x00007fb1e3a3220e in ProfileHandler::Instance () at 
src/profile-handler.cc:227
#17 0x00007fb1e3a324d9 in ?? () from ./libprofiler.so.0
#18 0x00007fb1e3a366d6 in ?? () from ./libprofiler.so.0

Original comment by freej...@gmail.com on 25 Apr 2011 at 9:27

GoogleCodeExporter commented 9 years ago
I thought it maybe espend so much cpu time in calling GetStackTrace at 
src/stacktrace_libunwind-inl.h:96. but, i don't kown why it runs ok in the 
32bit system.

Original comment by freej...@gmail.com on 25 Apr 2011 at 9:51

GoogleCodeExporter commented 9 years ago
Looks like the issue is with libunwind.  I'm sure it does something totally 
different on i386 and x86_64, which would explain the time difference.

I think libunwind 0.99 is really old; check their website for a new version 
which might be better.

Also, consider running with frame pointers on x86_64 rather than using 
libunwind (see the README for details).

I'm closing this not-a-bug since it doesn't seem to be anything on the 
perftools side.

Original comment by csilv...@gmail.com on 25 Apr 2011 at 9:25

GoogleCodeExporter commented 9 years ago
alright,thank you very much.

Original comment by freej...@gmail.com on 27 Apr 2011 at 2:22