Closed GoogleCodeExporter closed 9 years ago
I also tried linking against -ltcmalloc_minimal. Here are those results:
$ ./alloc_time_tcmalloc_minimal
Time taken per allocation: 183 nsecs
Time taken per de-allocation: 122 nsecs
So the allocation time is better by about 20 nsecs, but still considerably
worse than
allocation time without tcmalloc. The de-allocation time is unchanged.
Original comment by mohit.a...@gmail.com
on 14 Nov 2009 at 7:30
Every malloc implementation is going to have situations where it's better or
worse
than another. Artificial benchmarks like this might happen upon one such
situation
or another, but it's not very meaningful. I much prefer to measure relative
performance in real applications. The main use for benchmarks like this is to
provide a starting point to look at the implementation, to see if there's a
possibility for improvement.
I'm going to close this bug do not fix, but it would be great if you wanted to
look
into this more deeply, to understand why there are timing differences in this
(simple) case. That might turn up ways to tune tcmalloc, or a bug to fix (a
previous
benchmark like this one showed up a bug in our implementation of realloc).
Also, note that tcmalloc stands for 'thread-caching malloc'. It will perform
best,
relative to other mallocs, in threaded applications.
Original comment by csilv...@gmail.com
on 14 Nov 2009 at 7:58
I'd made this artificial benchmark only for demonstration purposes for this bug
report.
The reason I even wrote this benchmark is because I was seeing poor results in
a
production application. I obviously cannot disclose the code for that on a
bug-report.
Original comment by mohit.a...@gmail.com
on 14 Nov 2009 at 8:03
Aha, that's a different story!
Check out
http://groups.google.com/group/google-perftools/browse_thread/thread/87d79c8df8e
22b6d/7b5e97c5b92b4997?lnk=gst&q=slow#7b5e97c5b92b4997
Does changing the constants like suggested in this thread, speed things up for
you?
Original comment by csilv...@gmail.com
on 14 Nov 2009 at 9:35
Absolutely - I see a huge improvement using the suggested changes to common.h
in the
link given.
Here are the new times with the freshly built tcmalloc library:
$ ./alloc_time_tcmalloc
Time taken per allocation: 43 nsecs
Time taken per de-allocation: 17 nsecs
This makes tcmalloc faster than glibc - which is what I expected.
Can the next release of google perftools do this automatically please.
Original comment by mohit.a...@gmail.com
on 14 Nov 2009 at 11:01
Unfortunately, no promises: it improves speed on your machine, but slows it
down on
others. We're working on trying to figure out what constants work better in
what
situations, but it's tricky. I hope we'll be able to come up with a good
solution
for everyone in time for the next release.
Original comment by csilv...@gmail.com
on 14 Nov 2009 at 11:22
Actually - on digging some more, I realized that the performance degradation I
was
seeing earlier in tcmalloc was because my google perftools package was built
without
the -O2 flag. This is because I'd set CXXFLAGS, CPPFLAGS and CFLAGS explictly
before
running 'configure' - I was expecting that the perftools would add -O2 on top
of that
but it didn't.
When I tried the patch, I just did a vanilla build - so got the -O2 by default.
It turns out the patch to common.h does little on my machine. Its really the
-O2 that
matters.
Please close this bug as invalid.
Original comment by mohit.a...@gmail.com
on 14 Nov 2009 at 11:56
Good to know -- thanks for looking into this.
Original comment by csilv...@gmail.com
on 15 Nov 2009 at 12:07
Original issue reported on code.google.com by
mohit.a...@gmail.com
on 14 Nov 2009 at 7:20Attachments: