evilsong / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Lost wakeups in spinlock degrade tcmalloc performance substantially #491

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In uncontended cases, SpinLock::{Lock,Unlock} have an overhead of 13ns. In 
cases where the lock is contended, SpinLock::Unlock has a bug where it can fail 
to wakeup waiters, causing the overhead of SpinLock::{Lock,Unlock} to blow up 
from 13ns to as high as 256ms (a blow up by a factor of over a million!). In 
cases of heavy lock contention, this can cause applications to sleep for 
minutes at a time without doing anything, appearing to hang. We observe regular 
hangs in practice in Chrome in http://crbug.com/168139

The following fix slows down SpinLock::Unlock by 2ns in the uncontended case, 
but speeds up SpinLock::Lock by up to almost 256ms in the contended case where 
a wakeup ix mixed. If tcmalloc is used many times in an application in a 
multithreaded context (e.g. allocate in one thread, deallocate in another), 
this 256ms delay can add up quickly to several minutes of delay, causing web 
applications to time out and causing user-facing applications like Chrome to 
appear to hang.

At Google, SpinLock::{Lock,Unlock} are used for many things besides tcmalloc, 
so a 2ns slowdown may be unacceptable for some of those applications; however, 
for tcmalloc itself, I think it would be more than worth it to allow a 2ns 
slowdown in order to avoid an unpredictable 256ms penalty for multithreaded 
applications. Therefore I'm proposing that we roll the attached patch into open 
source tcmalloc itself in order to improve its performance.

This performance improvement will also be rolled into Chrome so as to improve 
Chrome's performance and avoid unnecessary delays.

This CL depends on http://code.google.com/p/gperftools/issues/detail?id=490 
which adds support for {Acquire,Release}_AtomicExchange.

Passes make and make check.

Original issue reported on code.google.com by davidjames@google.com on 8 Jan 2013 at 6:56

Attachments:

GoogleCodeExporter commented 9 years ago
As Aliaksey Kandratsenka pointed out, any reason to cast Release_AtomicExchange 
to uint64?

Original comment by zatr...@gmail.com on 10 Mar 2013 at 12:57

GoogleCodeExporter commented 9 years ago
r195 | chappedm@gmail.com | 2013-03-10 16:02:46 -0400 (Sun, 10 Mar 2013) | 7 
lines

issue-491: Significant performance improvement for spin lock contention

This patch fixes issues where spinlocks under contention were failing to
wakeup waiters, sometimes resulting in blow ups from 13ns to as high as 256ms.
Under heavy contention, applications were observed sleeping for minutes at a
time giving the appearance of a hang.

Original comment by chapp...@gmail.com on 10 Mar 2013 at 8:03