Closed GoogleCodeExporter closed 9 years ago
Breakpoint 1, ProfileHandler::StartTimer (this=0x4a) at
src/profile-handler.cc:387
387 void ProfileHandler::StartTimer() {
Current language: auto; currently c++
(gdb) where
#0 ProfileHandler::StartTimer (this=0x4a) at src/profile-handler.cc:387
#1 0x00007f10771005a7 in ProfileHandler::RegisterThread (this=0x1582010) at
src/profile-handler.cc:281
#2 0x00007f107710150f in ProfileHandlerInitializer (this=0x7f10773099a4) at
src/profile-handler.cc:445
#3 0x00007f1077100ed8 in ?? () from .libs/libprofiler.so.0
#4 0x00007f1077100eed in ?? () from .libs/libprofiler.so.0
#5 0x00007f1077105b16 in ?? () from .libs/libprofiler.so.0
looks like its because of profile-handler.cc:453 static
ProfileHandlerInitializer profile_handler_initializer;
Original comment by themaste...@gmail.com
on 24 May 2009 at 8:55
This does indeed seem to be a regression. I'll look into it for the next
release.
Original comment by csilv...@gmail.com
on 26 May 2009 at 5:16
Hmm, on closer look, it's not really a regression: we've had that behavior for
a long
time. Can you explain a bit why the current behavior is a problem for you? I'm
trying to figure out the best way to handle it -- it may not be as simple as it
seems
to just not turn on the itimer when the envvar isn't set -- so I'd like to
understand
what trade-offs to make.
Original comment by csilv...@gmail.com
on 27 May 2009 at 6:37
I'd like to link production binaries against libprofiler but prefer that my
process is not constantly being
interrupted unless I'm profiling. I'm not sure what the overhead of the timer
is if there's no signal handler set,
perhaps it's negligible? My processes are all single threaded so for now I'm
just going to patch perftools to
hardcode TIMERS_SHARED and skip the RegisterThread dance.
Original comment by themaste...@gmail.com
on 29 May 2009 at 6:12
Ah, ok, if that's the only issue, I think you're ok. setitimer adds negligible
overhead (as in, you really can't measure it). If you do register a handler, of
course, then things slow down quite a bit more. But we're careful not to do
that
unless the user asks for it.
} My processes are all single threaded so for now I'm just going to patch
perftools }
to hardcode TIMERS_SHARED and skip the RegisterThread dance.
You're welcome to, but I encourage you to just use the code as it is, first,
and see
if it causes any problems. Or you could try it both ways and see if you can
tell a
difference. It would be interesting to us if you could!
Original comment by csilv...@gmail.com
on 29 May 2009 at 7:33
Hello there,
We have hit a severe performance issue (8x slowdown) because of this behavior.
Yes, in a usual machine with just a few threads, the overhead is negligible.
However, we observed 30-50% slowdown in 60 core machines and 8x slowdown in 240
core machines. In a big many-core server, this is a huge issue.
Just like the OP described, our intent was to always link to libprofiler and
invoke ProfilerStart() only when needed to avoid interruptions, but then we hit
this issue. Currently, the only workaround is to set environment variable (eg
CPUPROFILE_FREQUENCY=1) or not link to libprofiler in production binaries, both
of them are inconvenient.
To identify and work around this issue, we tried several profiling and tuning
in kernel layer (the interruption causes heavy contention in linux's sighand
spinlock), but we concluded that the only solution that completely eliminates
the slowdown is to avoid setitimer. The root cause is that calling setitimer
requires, even without any handler, the kernel to update additional timer
information. This adds a lot of overhead because a spinlock is taken every time
the kernel makes an update to the timer statistics. we tried to optimize this
kernel behavior, but it is expensive even with the optimizton.
In general, a shared library should not do anything expensive when it's not
used. We really appreciate modifying libprofiler to invoke setitimer only when
the client program invokes some method in gperftools.
By the way, we also found a minor bug (maybe by design?) in
ProfileHandler::StartTimer(). When CPUPROFILE_FREQUENCY=1, it sets 1000000 to
tv_usec, which is considered as an invalid input, thus actually no timer is set
then. (well, hence this is one workaround to completely avoid the overheads,
but probably it's unintended?)
So, this:
void ProfileHandler::StartTimer() {
...
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 1000000 / frequency_;
...
}
should be something like this:
if (frequency_ == 1) {
timer.it_interval.tv_sec = 1;
timer.it_interval.tv_usec = 0;
} else {
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 1000000 / frequency_;
}
Original comment by hideaki.kimura@gmail.com
on 13 Jan 2015 at 9:47
I think disabling profiling timer on frequence of 1 was unintended.
IMHO main issue that prevents us from not setting up cpu time timers is that in
different timer sharing mode things differ quite a bit.
I.e. if timers are shared (default mode on Linux) it's fine and seemingly
simple to start them late. However when every thread has it's own timer, then
it's getting much harder.
If your use-case is so big I'd advise you to consider other profiling tools
such as perf.
Original comment by alkondratenko
on 18 Jan 2015 at 3:37
Thanks for the reply.
Is it possible to defer setitimer at least when the timers are in shared mode?
Well, >50 cores will soon become common environments though maybe not right
now, so this issue might come up in many server applications.
We do use perf to profile what gperftools can't (like kernel code), but
gperftools is a quite handy and flexible tool (kudos to you) to
programmatically start/stop profiling when the end user does something to the
application.
Also, if you are going to change the behavior of CPUPROFILE_FREQUENCY=1, we'd
really appreciate adding an alternate way to completely avoid the overheads
without modifying the binary, such as another env variable.
Original comment by hideaki.kimura@gmail.com
on 20 Jan 2015 at 6:01
I don't plan changing CPUPROFILE_FREQUENCY=1 behavior at least until we have
something about that timer overhead.
Speaking of shared mode timers, please be aware that at least on common-core
boxes (8) and modern linux it is easy to see that profiles are quite massively
skewed between threads. Looks like kernel isn't diving timer tick signals
proportionally enough.
I.e. your gperftools profiles might be wrong already. I've devised (off by
default) profiling mode that has per-thread cpu time timers. That mode gives me
good profiles. But I'm not able to see (at least yet) how to make it off by
default.
Original comment by alkondratenko
on 20 Jan 2015 at 7:25
Original issue reported on code.google.com by
themaste...@gmail.com
on 24 May 2009 at 8:47