API to iterate keys in order of hotness

GoogleCodeExporter commented 9 years ago

As described in https://issues.apache.org/jira/browse/CASSANDRA-1966, it would 
be useful for Cassandra to be able to traverse the map in order of hotness.  
(Increasing hotness preferred, but decreasing is also workable.)

On that page, Ben said "these would be more expensive as they would require 
traversing the LRU chain to perform a copy and be a blocking operation, but 
would not affect read/write operations."

I'm a little unclear as to how an operation would be both blocking but not 
affecting reads/writes; can you elaborate?

Original issue reported on code.google.com by jbel...@gmail.com on 12 Jan 2011 at 6:32

GoogleCodeExporter commented 9 years ago

Sorry for the vague remark. I need to rewrite the design page and only have a 
rough draft 
(http://code.google.com/p/concurrentlinkedhashmap/wiki/Design_draft). The older 
design docs which attempted a lock-free style are no longer relevant.

The core idea is that the LRU order can be maintained asynchronously with the 
hashtable so a read/write does not block on an LRU reorder operation. Instead, 
these operations are queued to defer execution and periodically applied. This 
means that concurrent reads/writes can occur and schedule their work, while 
another thread may acquire a lock to perform pending operations. This allows 
non-blocking reads and amortizes the penalty across threads.

Any operation that requires traversing the LRU directly would block as it is 
performed under a dedicated lock. This is currently only #clear() as its rare 
enough that queuing all the removals is wasteful compared to a short delay of 
waiting for a drain to complete.

The ordered traversal methods would also need to block and would be a snapshot 
of the hotness order. This is because a concurrent LRU traversal would behave 
incorrectly (e.g. node is reordered while iterating so repeated elements). This 
would have the negative effect of,
1) Concurrent calls to traversal / clear methods would be blocking (but rare!).
2) An O(n) copy of the LRU chain would be required for traversal methods, which 
might be too expensive for a large cache (10M?) like Cassandra.

If the penalty of (2) is not acceptable, then instead of a traditional external 
iterator an internal iterator could be used instead. That might be useful if 
you knew that you only wanted a subset, e.g. top 10%. This could either by a 
predicate function to determine whether to continue consuming.

What would be the ideal API for Cassandra? An ordered snapshot traversal method 
would be the most natural in Java, but I'm concerned that the large caches may 
make that unacceptable.

Original comment by Ben.Manes@gmail.com on 12 Jan 2011 at 7:02

GoogleCodeExporter commented 9 years ago

For Cassandra's purposes, it's reasonable to assume that we have room on the 
heap for a shallow copy of the cache contents.  Even more so if we only need to 
copy keys rather than values as well.

Original comment by jbel...@gmail.com on 12 Jan 2011 at 7:28

GoogleCodeExporter commented 9 years ago

The memory overhead wasn't my concern, since it would just be a pointer copy. 
The length of the the traversal of the LRU (linked list) would be large, e.g. 
O(10m). That would be a lot of pointer chasing and might take a while to 
traverse.

I should probably benchmark to get a rough idea of how long it takes to loop 
through a LinkedList at various sizes (e.g. 1k, 100k, 1m, 10m). I could also be 
misunderstanding what I recall of Cassandra having large caches, e.g. is it the 
memory footprint and/or the number of entries? If the number of entries isn't 
insanely large then I'm probably making too big a deal about the traversal cost.

Original comment by Ben.Manes@gmail.com on 12 Jan 2011 at 8:27

GoogleCodeExporter commented 9 years ago

As long as it's only blocking clear or other traversals, that should be fine -- 
we do have caches of millions of rows, but we only need to traverse-in-hotness 
order to save out the cache for reloading later, so "single digits of minutes" 
is the most frequent that makes sense to do that.

Original comment by jbel...@gmail.com on 12 Jan 2011 at 8:48

GoogleCodeExporter commented 9 years ago

Great, based on these requirements this should be trivial.

Original comment by Ben.Manes@gmail.com on 12 Jan 2011 at 10:59

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

We have another use case: https://issues.apache.org/jira/browse/CASSANDRA-2175

Running into this pointed out that we might not have room on the heap for a 
full shallow copy.  If we could say "give me a shallow copy of the N warmest 
entries" that would be ideal.

Original comment by jbel...@datastax.com on 16 Feb 2011 at 6:12

GoogleCodeExporter commented 9 years ago

OK, I'll get back to CLHM this weekend and start on the pending items. Sorry 
that its been on the back burner lately.

Original comment by bma...@google.com on 16 Feb 2011 at 7:25

GoogleCodeExporter commented 9 years ago

Now provides ascending and descending snapshot views of the keySet and map, 
with an optional limit to the number of captured. The ascending/descending 
order is based on the eligibility of retention. The method names try to best 
honor the convention in NavigatableMap.

"give me a shallow copy of the N warmest entries"
Map<K, V> snapshot = map.descendingMapWithLimit(N);

Original comment by Ben.Manes@gmail.com on 5 Mar 2011 at 9:13

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Fantastic! Thanks, Ben.

Original comment by jbel...@gmail.com on 7 Mar 2011 at 5:01

GoogleCodeExporter commented 9 years ago

Sorry for the delay. I'm ready to release this week unless there's any final 
feedback.

Please consider performing a canary test. Added optional task:
https://issues.apache.org/jira/browse/CASSANDRA-2661

Original comment by Ben.Manes@gmail.com on 17 May 2011 at 8:33

easyfmxu / concurrentlinkedhashmap

API to iterate keys in order of hotness #21