contain-rs / lru-cache

A cache that holds a limited number of key-value pairs
https://contain-rs.github.io/lru-cache/lru_cache/
Apache License 2.0
82 stars 36 forks source link

Limit the cache based on arbitrary measures #38

Open rphmeier opened 8 years ago

rphmeier commented 8 years ago

Motivation: My use-case for lru-cache involves inspecting items' heap size. Rather than hack it in, I think it would be beneficial to make the cache adaptable to whatever limit needs to be imposed.

You might have a set of heterogeneous items, each using a different amount of allocation on the heap. Maybe a more general Meter trait might help:

trait Meter<T> {
    fn measure(&self, item: &T) -> usize;
}

The default Meter could be a Count:

struct Count;

impl<T> Meter<T> for Count { 
    fn measure(&self, _: &T) -> usize { 1 }
}

Here's an implementation that would use the HeapSizeOf trait (I'd recommend this one for inclusion under a feature gate as it's generally useful.)

struct HeapSize;

impl<T: HeapSizeOf> Meter<T> for HeapSize {
    fn measure(&self, item: &T) -> usize { item.heap_size_of_children() + ::std::mem::size_of::<T>() }
}

When you add or remove an item, you add or subtract meter.measure(&item) from the maximum respectively. If the metered total exceeds the maximum while adding an item, other items must be removed in a loop until the total falls below the limit.

Caveats:

Alternatively:

Looking forward to your comments and naming suggestions below.

apasel422 commented 8 years ago

This is a neat idea. I need to ponder this further, but I think I'm in favor of going the HashMap route and just documenting that changing an item's measure after it's been inserted is a logic error, potentially also providing a method that can be used to update the measure in a controlled way once it's in the map.

rphmeier commented 8 years ago

@apasel422 The situation is a bit different, though, since the measure can be changed through plain old mutability as opposed to just interior mutability like HashMap keys.

I wonder if Meter should be Meter<K, V> and measurement should be done based on both key and value sizes (String as key type comes to mind)

apasel422 commented 8 years ago

I do like the idea of a CacheHandle (modulo naming?), even though it's possible to mem::forget it. A recent change to BinaryHeap is similar and declared leaking OK, since it doesn't affect memory safety. As for backwards compatibility, we could just have a semver bump, or declare a new method on LruCache<K, V, M> that returns CacheHandle instead of &mut V and leave get_mut only on LruCache<K, V> (assuming it defaults to Count).

I was also thinking it could be Meter<K, V>.

Is there a way to avoid storing a duplicate usize with Count, since it's derivable from the underlying LinkedHashMap's len()?

luser commented 7 years ago

I have a patch for this mostly done, with the exception of the get_mut bits.