contain-rs / lru-cache

A cache that holds a limited number of key-value pairs
https://contain-rs.github.io/lru-cache/lru_cache/
Apache License 2.0
83 stars 36 forks source link

Allow arbitrary measures for cache size limit. Fixes #38. #40

Closed luser closed 7 years ago

luser commented 7 years ago

This commit allows creating LruCaches that are limited in size by something other than a count of entries. It does this by providing a new trait, Meter, where implementations of the trait provide a measure method to do the measurement. The default behavior of measuring by the count of entries is provided by a Count type.

Due to difficulties with mutating entries making it possible to change an item's measure, the get_mut and iter_mut methods are provided only on LruCaches using Count for sizing.

A bit of optimization is provided for the default case--to avoid having to redundantly store the item count the Meter trait allows implementations to provide a type for storing the measurement, and the Count implementation uses (). The CountableMeter trait allows LruCache to work with this or any Meter implementation that uses usize for measurement.

Finally an optional feature is added, heapsize, which uses the heapsize crate to provide HeapSizeOf, a measurement based on the heap memory usage of values stored in the cache.

Thanks to @rphmeier for the design suggestions!

Since this makes get_mut not available on caches that aren't using the default Count metric, you will probably want to get non-mutable get merged as well to mitigate that.

One issue that's not addressed by this change is providing the list of removed items on an insert or set_capacity that requires removing multiple items. This would make my use case simpler, so I may implement that as a follow-up once this PR is finished.

FlashCat commented 7 years ago

Thanks for the pull request, and welcome! The contain-rs team is excited to review your changes, and you should hear from @bluss (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

rphmeier commented 7 years ago

The Measure associated type is a nice touch, I like that the heapsize meter doesn't recalculate the size on each insertion.

Concerns:

Alternatively we could declare

pub trait Meter<K: Hash + Eq, V> {
    type Measure: Ord + Copy;

    /// Accumulate a key-value pair into the current measure.
    fn add(&mut self, key: &K, val: &V);

    /// Remove a key-value pair from the current measure.
    fn remove(&mut self, key: &K, val: &V);

    /// Remove value measure (for when an old value exists)
    fn remove_val(&mut self, val: &V);

    /// Fetch the current measure.
    fn current(&self) -> Self::Measure;
}

And then also a

// maybe doc(hidden)?
/// Meter where the current measure can be derived from the cache.
pub trait DerivableMeter<K: Hash + Eq, V> {
    // ...all same as `Meter` except for this:
    fn current<S: BuildHasher>(&self, cache: &LruCache<K, V, S, Self>) -> Self::Measure;
}

// trivial implementation
impl<K: Hash + Eq, V, M> DerivableMeter<K, V> for M where M: Meter<K, V> { ... }

Then the cache is defined as

pub struct LruCache<K: Hash + Eq, V, S: BuildHasher, M: DerivableMeter<K, V>> {
    map: ...,
    max_measure: M::Measure,
    meter: M,
}

and insert might look like

fn insert(&mut self, key: K, val: V) -> Option<V> {
    self.meter.add(&key, &val);

    let old_val = self.map.insert(key, val);

    if let Some(ref val) = old_val {
        self.meter.remove_val(val);
    }

    while self.meter.current(&*self) > self.max_measure {
        self.remove_lru()
    }

    old_val
}

Count's Measure could still be () but the current method would fetch from the underlying map.

edit: miswrote; this whole idea was designed around using usize for Count::Measure

luser commented 7 years ago

That Travis failure looks like a transient network issue: "Unable to update registry https://github.com/rust-lang/crates.io-index"

luser commented 7 years ago

If insert is only declared for M: CountableMeter then how are you expected to use the cache when it is uncountable? What are some examples of countable meters -- shouldn't you always be able to tell what the sum or difference of two Measures is?

CountableMeter is a bit of a strange name, I worked through several designs before winding up on that. Basically, I wanted Count to not keep an extraneous counter on the cache, so I couldn't rely on having the Measure type implement Add + Sub, or be convertible to usize. I could have put all the methods of CountableMeter on Meter itself, but then users implementing Meter would have to implement them, which seems silly for the general case. I could rename CountableMeter to something else, it really just exists to paper over the quirk of Count not storing a count (which is pretty ironic in hindsight).

The intention was for LruCache to only work with types implementing CountableMeter, so if I left a M: Meter on it somewhere that's a mistake!

I don't really see how your DerivableMeter proposal differs much from my existing CountableMeter implementation. Is there something there I'm missing?

luser commented 7 years ago
max_measure: M::Measure,

FYI, I don't see how that could be made to work with Count having the unit type. It still needs to store that as a usize.

rphmeier commented 7 years ago

@luser yeah, in the suggestion above Count::Measure == usize, but it can fetch it by directly implementing DerivableMeter and calling cache.map.len()

bluss commented 7 years ago

Can I reassign this to you @rphmeier?

rphmeier commented 7 years ago

Sure thing @bluss

luser commented 7 years ago

@rphmeier There was some back and forth there, but are there specific changes you'd like to see to this PR? I'm happy to fix things, I just don't think I understand what you're looking for.

sfackler commented 7 years ago

Ping @rphmeier

Gankra commented 7 years ago

@luser has forked the repo for their purposes, and we don't seem to have the bandwidth for this kind of work anymore.

Closing.