Race condition with ConcurrentLinkedHashMap

GoogleCodeExporter commented 9 years ago

We are seeing a race condition with ConcurrentLinkedHashMap using
appendToTail. We are using this library in Cassandra and came across this
problem from trunk.

The open ticket in question is at (full stack trace):

https://issues.apache.org/jira/browse/CASSANDRA-405

Stack trace:

pool-1-thread-{63,62,61,59,58,54,53,51,49,47} and
ROW-READ-STAGE{8,7,5,4,3,2,1}:
"ROW-READ-STAGE:8" prio=10 tid=0x00007f1b78b52000 nid=0x1945 runnable
[0x0000000046532000]
   java.lang.Thread.State: RUNNABLE
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$
Node.appendToTail(ConcurrentLinkedHashMap.java:536)
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.
putIfAbsent(ConcurrentLinkedHashMap.java:281)
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.
put(ConcurrentLinkedHashMap.java:256)
at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
at org.apache.cassandra.db.filter.SSTableNamesIterator.
(SSTableNamesIterator.java:46)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQu
eryFilter.java:69)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1445)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1379)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1398)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1379)
at org.apache.cassandra.db.Table.getRow(Table.java:589)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65
)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

ROW-READ-STAGE6 is a little different:
"ROW-READ-STAGE:6" prio=10 tid=0x00007f1b78b4e000 nid=0x1943 runnable
[0x0000000046330000]
   java.lang.Thread.State: RUNNABLE
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap$
Node.appendToTail(ConcurrentLinkedHashMap.java:540)
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.
putIfAbsent(ConcurrentLinkedHashMap.java:281)
at
com.reardencommerce.kernel.collections.shared.evictable.ConcurrentLinkedHashMap.
put(ConcurrentLinkedHashMap.java:256)
at org.apache.cassandra.io.SSTableReader.getPosition(SSTableReader.java:241)
at org.apache.cassandra.db.filter.SSTableNamesIterator.
(SSTableNamesIterator.java:46)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQu
eryFilter.java:69)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1445)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1379)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1398)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java
:1379)
at org.apache.cassandra.db.Table.getRow(Table.java:589)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65
)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:78)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:44)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Original issue reported on code.google.com by chrisgof...@gmail.com on 1 Sep 2009 at 5:23

GoogleCodeExporter commented 9 years ago

This is a known issue and the unit tests demonstrate the failure. I have not 
had 
time to actively work on this project and a few months back I was having 
trouble 
understanding the scenario that caused it. When I have been emailed regarding 
the 
production status, as warned on the front-page, I have recommend alternative 
approaches. I'm sorry, but I wasn't aware the Cassandra project was attempting 
to 
use it or else I would have warned you too!

There are a few excellent choices:
1) Use a soft-reference map, such as from Google Collections, and allow the GC 
to 
manage the memory. Soft references are reclaimed in LRU order.
2) Use a ReentrantLock to guard the list mutator operations. With 
SecondChanceFifo 
this provides excellent performance as reads remain lock-free. With LRU there 
is 
always contention on the tail and the loss won't be noticeable. This is a 
simple and 
safe change. It also provides a much smaller critical section than locking a 
LinkedHashMap, so the necessary performance gain can be realized. I can provide 
an 
implementation if desired.
3) Use a standard algorithm! I chose to explore for personal education rather 
than 
adopt a known version, because this is not work related and I wanted to learn 
rather 
than copy. If it was work related then I would have adopted the algorithm 
described 
in:
http://www.cs.chalmers.se/~dcs/ConcurrentDataStructures/phd_chap7.pdf

Doug Lea has expressed interest in writing a standard version in a future 
release of 
java.util.concurrent, which will likely be based on his ConcurrentLinkedDeque 
(available for, but not adopted in, JDK6). Personally, I am looking forward to 
seeing how he attacks the challenge as its an interesting problem.

Please let me know if you'd like me to provide any further help. Since I have 
been 
asked 3 times this week, I can try to get back to this project and resolve the 
race.

You can contact me here, on the group, or through private email (ben.manes-at-
gmail.com).

Best,
Ben

Original comment by Ben.Manes@gmail.com on 1 Sep 2009 at 5:55

GoogleCodeExporter commented 9 years ago

Hi Ben,  Thanks for the quick response.  I work with Chris.  I have attached a 
rough 
approximation of the patch as you mentioned in #2.  I'm not sure if the locks 
are at 
too coarse of a level.  Also if it is necessary to have the lock in evict, but 
it 
seems to work.  The unit tests weren't passing for me before and now they 
consistently pass.

Cheers,
Sammy

Original comment by temi...@gmail.com on 1 Sep 2009 at 6:27

Attachments:

ReentrantLock-patch.diff

GoogleCodeExporter commented 9 years ago

Hi Sammy,

This was not what I meant, because the critical section includes all of the map 
operations. Your change is almost equivilant to wrapping a LinkedHashMap with a 
lock 
(or synchronized if JDK6). That is the standard approach almost everyone uses, 
because the most often the miss penalty (db hit) far outweights the hit penalty 
(lock contention).

Instead only the Node<K, V> operations must use a global lock so that the list 
is 
maintained in a valid state. This provides lock splitting between the list and 
map 
and the ConcurrentHashMap has excellent locking performance (lock-free reads, 
striped locks for writes). Because the critical section is small, the 
ReentrantLock 
will probably spin and not need to promote itself to a full lock under normal 
usage 
patterns. Since an LRU always creates contention this will be similar to a 
lock-free 
version in performance comparisions. It will also continue to allow a 
SecondChanceFifo to use lock-free reads (SC-FIFO is a great choice under most 
common 
workloads).

I will make the necessary changes tomorrow evening, post a JAR, and explain the 
code 
on a wiki page. I will keep the version in the repository as is, though, to 
allow 
further experimentation for when I have some time. The fully functional JAR 
should 
allow people to get the necessary performance, without that nasty surprise in 
production!

Best,
Ben

Original comment by Ben.Manes@gmail.com on 1 Sep 2009 at 7:09

GoogleCodeExporter commented 9 years ago

I see clhm-production.jar did get uploaded on the 4th.  May I suggest that the 
right
thing to do is make a tag or branch of the buggy version for experimentation, 
apply
the fix to trunk, and close this issue so it's clear that the issue is resolved?

Original comment by jbel...@gmail.com on 14 Sep 2009 at 6:49

GoogleCodeExporter commented 9 years ago

Ben, you mentioned that this was a known issue and that there are unit tests 
that
points this out.
As I'm currently investigating whether or not this bug affects my project, I 
would
really appreciate if you could point out where and when this race condition 
occurs.
Which unit tests reproduces this problem?
Is the problem solved now? I'm a bit curious since the bug is still open in the 
tracker.

Thanks for a doing a great job.
/Lars Gråmark

Original comment by lars.gra...@gmail.com on 19 Oct 2009 at 3:36

GoogleCodeExporter commented 9 years ago

lars, the clhm-production.jar Ben posted does fix the bug in our (Cassandra) 
testing.

Original comment by jbel...@gmail.com on 19 Oct 2009 at 3:37

GoogleCodeExporter commented 9 years ago

Ok, good to know. The question still remains how this can be reproduced. 
Thanks!

/Lars

Original comment by lars.gra...@gmail.com on 20 Oct 2009 at 7:32

GoogleCodeExporter commented 9 years ago

Hi Lars,

Sorry I haven't gotten a chance to get back to this bug, but I just have been 
utterly swamped. However I will try to become active again and follow jbellis' 
excellent suggestion regarding branching to clarify things. (Thanks  jbellis!)

I think I have figured out the scenario causing the bug, but I need to 
determine how 
best to resolve it. It can be reproduced if you download the code in /trunk and 
execute the unit tests. The jar contains a special build using code posted on 
the 
wiki, which passes the unit tests. Once I have applied jbellis' suggestion, 
that 
functional version will be in /branch.

The reason why this bug occurs is very complicated to explain, but the state of 
the 
linked list becomes corrupt. Because no thread can make progress it results in 
a 
live-lock (all threads spinning). This will not happen in the "production" jar 
which 
uses locks efficiently rather than a pure CAS-based algorithm. The "production" 
jar 
passes the load testing and should satisfy your performance needs.

Thanks!
Ben

Original comment by Ben.Manes@gmail.com on 20 Oct 2009 at 7:50

GoogleCodeExporter commented 9 years ago

No worries. I ran the unit tests and they do fail as you mentioned. I also tried
similar tests to determine if the version tagged as 20080918 had the same 
problems.
It appears as it does NOT have these concurrency problems.
Since we're using the 20080918 version and it seems to work fine, I guess we'll 
stay
with that version a little longer.

/Lars

Original comment by lars.gra...@gmail.com on 20 Oct 2009 at 12:44

GoogleCodeExporter commented 9 years ago

Can you please use "clhm-production.jar", available in the download section, 
instead? Thanks!

Original comment by Ben.Manes@gmail.com on 20 Oct 2009 at 4:45

GoogleCodeExporter commented 9 years ago

Perhaps, but I'm a bit hesitant since then I would be forced to run a very long 
test
suite if I replace this core component. I'll just stick with the old version 
now that
I know that it works fine and is not affected by this bug.

/Lars

Original comment by lars.gra...@gmail.com on 21 Oct 2009 at 7:27

GoogleCodeExporter commented 9 years ago

FYI,

I have a total rewrite that is passing all the tests, uses an LRU policy, and 
is 
very fast! With ConcurrentHashMap as the ideal case, I get 0.96x per read and 
0.87x 
per write performance. It scales very well as more threads are added. Early 
benchmarks are:

# per read 
ConcurrentLinkedHashMap: 0.85us 
ConcurrentHashMap: 0.82us 
LinkedHashMap: 19.31us 

# per write 
ConcurrentLinkedHashMap: 2.49us
ConcurrentHashMap: 2.17us 
LinkedHashMap: 22.58 us 

I plan on closing this bug as "WontFix" once I release this implementation.

Cheers,
Ben

Original comment by Ben.Manes@gmail.com on 27 Mar 2010 at 9:24

GoogleCodeExporter commented 9 years ago

Great, I look forward to seeing it.

Original comment by jbel...@gmail.com on 27 Mar 2010 at 1:00

GoogleCodeExporter commented 9 years ago

Checked in. I'd appreciate a code review. ;)

Original comment by Ben.Manes@gmail.com on 30 Mar 2010 at 5:39

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

what is LIRS, and how do the numbers in comment #12 compare with the code from 
the
[now] old "production" jar?

Original comment by jbel...@gmail.com on 13 Apr 2010 at 3:42

GoogleCodeExporter commented 9 years ago

I haven't benchmarked against the old "production" jar. If in LRU, far better 
as 
there is no lock contention. If in SecondChanceFifo mode then I would expect 
that 
reads would be equivalent and writes to be better. The production version 
doesn't do 
much on reads but set a per-entry volatile flag. However, on writes it may need 
to 
flush the FIFO queue if there are too many reads that it can't find a suitable 
victim. That may occur on large caches where many entries were read, so the 
penalty 
could be quite high. On small caches it would be acceptable. The result is that 
writes may have a high penalty and the policy degrade to a FIFO under certain 
workloads.

This new version is a true LRU and the penalty for flushing the buffers is very 
cheap. It should have equal read performance, cheap writes, a better hit rate, 
and 
most importantly no degradation/performance suffering scenarios. Its all around 
better, imho. If Cassandra is using the production version for large caches, I 
would 
expect more reliable behavior with the new version.

LIRS is a page replacement policy that combines frequency with recency and 
beats 
LRU's hit rate. Most alternatives only work well under certain scenarios, but 
LIRS 
performs well under a variety. It can also be implemented in O(1) time like LRU 
so 
it has little overhead, but is similarly not concurrency-friendly. The design 
of the 
new CLHM should resolve the multi-threading issue, like it does for LRU. In my 
synthetic benchmark of a prototype it equals or beats LRU (5%+ with a shoddy 
workload). It performs about equal to IBM's ARC policy, but doesn't have the 
patent 
issues.

See:
- http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.437
- http://people.cs.vt.edu/~butta/docs/sigmetrics05_kernelPrefetch.pdf

Original comment by Ben.Manes@gmail.com on 13 Apr 2010 at 4:03

GoogleCodeExporter commented 9 years ago

Cool, we'll have a look at the new LRU code.

If trunk is at a stable point, can you tag a release so we don't have to track 
the
svn revision we're testing manually?  Because that's a pain. :)

Original comment by jbel...@gmail.com on 13 Apr 2010 at 4:16

GoogleCodeExporter commented 9 years ago

Sure, will do. I'm used to Perforce so give me a bit to figure SVN out. ;)

Original comment by Ben.Manes@gmail.com on 13 Apr 2010 at 4:29

GoogleCodeExporter commented 9 years ago

tag/release-1.0-LRU

Have fun! :)

Original comment by Ben.Manes@gmail.com on 13 Apr 2010 at 4:39

GoogleCodeExporter commented 9 years ago

thanks!

Original comment by jbel...@gmail.com on 13 Apr 2010 at 4:44

easyfmxu / concurrentlinkedhashmap

Race condition with ConcurrentLinkedHashMap #9