blueszhangsh / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

linking against libprofiler starts the ITIMER_PROF even if CPUPROFILE env var is not set #133

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
$ cat test.c 
#include <stdlib.h>
int main(){ int i; for(i=0; i<10000000; i++); return 0; }

$ gcc -o test test.c -L.libs -lprofiler

$ LD_LIBRARY_PATH=.libs strace -ttT ./test 2>&1 | egrep '(setitimer|PROF)'
15:45:56.989224 rt_sigaction(SIGPROF, {SIG_IGN}, NULL, 8) = 0 <0.000006>
15:45:56.989325 setitimer(ITIMER_PROF, {it_interval={0, 10000}, it_value={0, 
10000}}, NULL) = 0 
<0.000008>
15:45:57.004651 --- SIGPROF (Profiling timer expired) @ 0 (0) ---
15:45:57.016599 --- SIGPROF (Profiling timer expired) @ 0 (0) ---

Original issue reported on code.google.com by themaste...@gmail.com on 24 May 2009 at 8:47

GoogleCodeExporter commented 9 years ago
Breakpoint 1, ProfileHandler::StartTimer (this=0x4a) at 
src/profile-handler.cc:387
387 void ProfileHandler::StartTimer() {
Current language:  auto; currently c++
(gdb) where
#0  ProfileHandler::StartTimer (this=0x4a) at src/profile-handler.cc:387
#1  0x00007f10771005a7 in ProfileHandler::RegisterThread (this=0x1582010) at 
src/profile-handler.cc:281
#2  0x00007f107710150f in ProfileHandlerInitializer (this=0x7f10773099a4) at 
src/profile-handler.cc:445
#3  0x00007f1077100ed8 in ?? () from .libs/libprofiler.so.0
#4  0x00007f1077100eed in ?? () from .libs/libprofiler.so.0
#5  0x00007f1077105b16 in ?? () from .libs/libprofiler.so.0

looks like its because of profile-handler.cc:453 static 
ProfileHandlerInitializer profile_handler_initializer;

Original comment by themaste...@gmail.com on 24 May 2009 at 8:55

GoogleCodeExporter commented 9 years ago
This does indeed seem to be a regression.  I'll look into it for the next 
release.

Original comment by csilv...@gmail.com on 26 May 2009 at 5:16

GoogleCodeExporter commented 9 years ago
Hmm, on closer look, it's not really a regression: we've had that behavior for 
a long
time.  Can you explain a bit why the current behavior is a problem for you?  I'm
trying to figure out the best way to handle it -- it may not be as simple as it 
seems
to just not turn on the itimer when the envvar isn't set -- so I'd like to 
understand
what trade-offs to make.

Original comment by csilv...@gmail.com on 27 May 2009 at 6:37

GoogleCodeExporter commented 9 years ago
I'd like to link production binaries against libprofiler but prefer that my 
process is not constantly being 
interrupted unless I'm profiling. I'm not sure what the overhead of the timer 
is if there's no signal handler set, 
perhaps it's negligible? My processes are all single threaded so for now I'm 
just going to patch perftools to 
hardcode TIMERS_SHARED and skip the RegisterThread dance.

Original comment by themaste...@gmail.com on 29 May 2009 at 6:12

GoogleCodeExporter commented 9 years ago
Ah, ok, if that's the only issue, I think you're ok.  setitimer adds negligible
overhead (as in, you really can't measure it).  If you do register a handler, of
course, then things slow down quite a bit more.  But we're careful not to do 
that
unless the user asks for it.

} My processes are all single threaded so for now I'm just going to patch 
perftools }
to hardcode TIMERS_SHARED and skip the RegisterThread dance.

You're welcome to, but I encourage you to just use the code as it is, first, 
and see
if it causes any problems.  Or you could try it both ways and see if you can 
tell a
difference.  It would be interesting to us if you could!

Original comment by csilv...@gmail.com on 29 May 2009 at 7:33

GoogleCodeExporter commented 9 years ago
Hello there,
We have hit a severe performance issue (8x slowdown) because of this behavior. 
Yes, in a usual machine with just a few threads, the overhead is negligible. 
However, we observed 30-50% slowdown in 60 core machines and 8x slowdown in 240 
core machines. In a big many-core server, this is a huge issue.

Just like the OP described, our intent was to always link to libprofiler and 
invoke ProfilerStart() only when needed to avoid interruptions, but then we hit 
this issue. Currently, the only workaround is to set environment variable (eg 
CPUPROFILE_FREQUENCY=1) or not link to libprofiler in production binaries, both 
of them are inconvenient.

To identify and work around this issue, we tried several profiling and tuning 
in kernel layer (the interruption causes heavy contention in linux's sighand 
spinlock), but we concluded that the only solution that completely eliminates 
the slowdown is to avoid setitimer. The root cause is that calling setitimer 
requires, even without any handler, the kernel to update additional timer 
information. This adds a lot of overhead because a spinlock is taken every time 
the kernel makes an update to the timer statistics. we tried to optimize this 
kernel behavior, but it is expensive even with the optimizton.

In general, a shared library should not do anything expensive when it's not 
used. We really appreciate modifying libprofiler to invoke setitimer only when 
the client program invokes some method in gperftools.

By the way, we also found a minor bug (maybe by design?) in 
ProfileHandler::StartTimer(). When CPUPROFILE_FREQUENCY=1, it sets 1000000 to 
tv_usec, which is considered as an invalid input, thus actually no timer is set 
then. (well, hence this is one workaround to completely avoid the overheads, 
but probably it's unintended?)

So, this:
void ProfileHandler::StartTimer() {
  ...
  timer.it_interval.tv_sec = 0;
  timer.it_interval.tv_usec = 1000000 / frequency_;
  ...
}

should be something like this:
  if (frequency_ == 1) {
    timer.it_interval.tv_sec = 1;
    timer.it_interval.tv_usec = 0;
  } else {
    timer.it_interval.tv_sec = 0;
    timer.it_interval.tv_usec = 1000000 / frequency_;
  }

Original comment by hideaki.kimura@gmail.com on 13 Jan 2015 at 9:47

GoogleCodeExporter commented 9 years ago
I think disabling profiling timer on frequence of 1 was unintended.

IMHO main issue that prevents us from not setting up cpu time timers is that in 
different timer sharing mode things differ quite a bit.

I.e. if timers are shared (default mode on Linux) it's fine and seemingly 
simple to start them late. However when every thread has it's own timer, then 
it's getting much harder.

If your use-case is so big I'd advise you to consider other profiling tools 
such as perf.

Original comment by alkondratenko on 18 Jan 2015 at 3:37

GoogleCodeExporter commented 9 years ago
Thanks for the reply.
Is it possible to defer setitimer at least when the timers are in shared mode?
Well, >50 cores will soon become common environments though maybe not right 
now, so this issue might come up in many server applications.

We do use perf to profile what gperftools can't (like kernel code), but 
gperftools is a quite handy and flexible tool (kudos to you) to 
programmatically start/stop profiling when the end user does something to the 
application.

Also, if you are going to change the behavior of CPUPROFILE_FREQUENCY=1, we'd 
really appreciate adding an alternate way to completely avoid the overheads 
without modifying the binary, such as another env variable.

Original comment by hideaki.kimura@gmail.com on 20 Jan 2015 at 6:01

GoogleCodeExporter commented 9 years ago
I don't plan changing CPUPROFILE_FREQUENCY=1 behavior at least until we have 
something about that timer overhead.

Speaking of shared mode timers, please be aware that at least on common-core 
boxes (8) and modern linux it is easy to see that profiles are quite massively 
skewed between threads. Looks like kernel isn't diving timer tick signals 
proportionally enough.

I.e. your gperftools profiles might be wrong already. I've devised (off by 
default) profiling mode that has per-thread cpu time timers. That mode gives me 
good profiles. But I'm not able to see (at least yet) how to make it off by 
default.

Original comment by alkondratenko on 20 Jan 2015 at 7:25