Closed GoogleCodeExporter closed 9 years ago
FWIW, chromium encountered this issue too. I had to change the default to 0 in
our branch of tcmalloc.
Original comment by willchan@chromium.org
on 2 Jun 2010 at 5:38
If you're not doing any fancy stuff, consider using -ltcmalloc_minimal. It's
smaller,
and -- if I coded it right -- doesn't do any sampling in tcmalloc.
I'll talk it over here about defaulting the sample parameter to 0 even for
non-minimal
libtcmalloc.
Original comment by csilv...@gmail.com
on 2 Jun 2010 at 7:00
Also, a 1.5x speedup is very surprising to us. The sampling code has been made
more
effective since tcmalloc 0.8. It's true we still sample by default, but you
may find
the overhead in tcmalloc 1.5 is significantly lower. If you're able to try it
out,
I'd be interested to hear what you find.
Original comment by csilv...@gmail.com
on 2 Jun 2010 at 8:08
We currently use -ltcmalloc_minimal and still see this issue. We'll try 1.5 and
let you know what we see.
The 1.5x speedup seemed to be due to a lock being held during
DoSampledAllocation which the other
threads all spun waiting for.
Original comment by meta...@gmail.com
on 2 Jun 2010 at 9:04
Yes, I can believe the change to no longer sample in tcmalloc_minimal, came
after
v0.8. Definitely try 1.5.
} The 1.5x speedup seemed to be due to a lock being held during
DoSampledAllocation
} which the other threads all spun waiting for.
All allocations require a lock, sampled or not, but it's quite likely the code
has
been rewritten since 0.8 so less of the sampling work is done while holding the
lock.
Original comment by csilv...@gmail.com
on 2 Jun 2010 at 9:10
My findings are as follows:
Firstly, switching from 0.8 to 1.5 has given us something like a 9% speedup
overall,
so thanks for that! And as you describe, -ltcmalloc_minimal does indeed appear
to do
no sampling.
Using -ltcmalloc, the runtime difference between running with
tcmalloc_sample_parameter set to 0 and to the (apparently doubled since 0.8)
default
of 512K now seems to be less than 1% on my single-threaded run. On my
4-threaded run,
the degradation is more than 15% -- a significant improvement, but still pretty
bad.
Original comment by meta...@gmail.com
on 3 Jun 2010 at 4:13
OK, we've reached consensus to make the default 0 for the next release.
Original comment by csilv...@gmail.com
on 3 Jun 2010 at 8:12
This is changed in perftools 1.6, just released.
Original comment by csilv...@gmail.com
on 5 Aug 2010 at 8:52
Original issue reported on code.google.com by
meta...@gmail.com
on 2 Jun 2010 at 5:30