al8n / stretto

Stretto is a Rust implementation for Dgraph's ristretto (https://github.com/dgraph-io/ristretto). A high performance memory-bound Rust cache.
Apache License 2.0
409 stars 28 forks source link

Can key lifetime be shortened to insert/get/destroy functions only? #50

Open peter-scholtens opened 1 year ago

peter-scholtens commented 1 year ago

As the (ri)stretto algorithm does not store/modify the key, it should also be able to only borrow it? The lifetime of this borrow should therefore only resemble or exceed the lifetime of the function calls of insert(), _tryinsert(), get(), _getmut() and remove(), but not the lifetime of the full cache, I assume. To simply demonstrate the idea I modified the synced example version to this code below where the key is a &str:

use std::time::Duration;
use stretto::Cache;

// key should live as long as function call while cache takes ownership of value
fn new_cache<'a>() -> Cache<&'a str, String> {
    let c: Cache<&str, String> = Cache::new(12960, 1e6 as i64).unwrap();
    c
}

fn main() {
    let r = rand::random::<u8>();
    let c = new_cache();

    {
        // create storage key from random number, and set a value with a cost of 1
        let store_key = format!("key1-{}", r);
        // set a value with a cost of 1
        c.insert(&store_key, "value1".to_string(), 1);
        // set a value with a cost of 1 and ttl
        c.insert_with_ttl("key2", "value2".to_string(), 1, Duration::from_secs(3));

        // wait for value to pass through buffers
        c.wait().unwrap();
    }

    // Create a search key
    let key1 = "key1";

//... rest is not modified ...

However, that fails with the following error:

24 |     }
   |     - `store_key` dropped here while still borrowed

The idea of this enhancement is that users of the cache do not need to copy their key str to a String possibly saving CPU processing time by only handling a non-mutable borrow. I assume this can be done by inserting a lifetime specifier in the function calls?

al8n commented 1 year ago

Yes, we can only borrow it. We only need a key to calculate a hash value and then we forget the key.

peter-scholtens commented 1 year ago

I don't understand you reply. With version https://github.com/al8n/stretto/commit/b161a98067a99adfd8a4258852ae3b7aa9bcfdcc and the above example, I observe the problem as mentioned above: store_key is still borrowed by cache c which causes the compiler error. So, the algorithm can do that, but the current Rust code cannot. Shall I investigate a patch for this?

al8n commented 1 year ago

I don't understand you reply. With version b161a98 and the above example, I observe the problem as mentioned above: store_key is still borrowed by cache c which causes the compiler error. So, the algorithm can do that, but the current Rust code cannot. Shall I investigate a patch for this?

Yeah, the current Rust code cannot do that. It needs some small changes. Any changes to solve this are welcome.

peter-scholtens commented 1 year ago

I tried several thing with lifetimes, reference to types etc., but the only working solution I could invent avoids mentioning the key type at the instantiation time of the cache, but sets the type at the calling get() or get_mut() statement. Otherwise it seems you either get a borrowed value does not live long enough or a doesn't have a size known at compile-time error when including the key type in the structure definition. The weird thing is that we now get another degree of freedom to use different key types for the same cache like e.g. combining c.insert::<str>( ... ) and c.insert::<u64>( .. ) in the same code is possible. However, as this results every time in a close to 100% cache miss, it is unlikely this misuse will propagate to real production code.

When studying the code, some some questions rose in my mind:

  1. No single time the function borrow() is called. It looks to me we can easily remove all this Borrow trait conditions of insertion keys K and query key Q?
  2. What's the reason for fn build_key( ... ) -> (u64, u64) and not using fn build_key( ... ) -> (u128)?

Some info I used from stackoverflow and also someone else who noticed that copying should be avoided for a fast response webproxy: "...it has to read it from the NGINX C struct, allocate a Lua string and then copy it...".

Below a working example, where the key is borrowed several times.

#![feature(default_free_fn)]

use std::collections::hash_map::DefaultHasher;
use std::fmt::Debug;
use std::hash::{Hash, Hasher};

// BorrowedKeyCache holds only one value, with a hashed key.
#[derive(Debug)]
struct BorrowedKeyCache<V> {
    stored_hkey: Option<u64>,
    stored_value: V,
}

impl<V: Debug + Default> BorrowedKeyCache<V> {
    /// Instanciate new, empty cache
    pub fn new() -> BorrowedKeyCache<V> {
        return BorrowedKeyCache {
            stored_hkey: None,
            stored_value: <V>::default(),
        };
    }

    /// Insert value V while borrowing key K
    pub fn insert<K>(&mut self, key: &K, value: V) -> bool
    where
        K: Hash + Eq + Debug + ?Sized,
    {
        println!("insert: {:?}, v: {:?}", key, value);
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        self.stored_hkey = Some(hasher.finish());
        self.stored_value = value;
        return true;
    }

    /// Query with key Q
    pub fn get<Q>(&self, key: &Q) -> Option<&V>
    where
        Q: Hash + Eq + Debug + ?Sized,
    {
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        if self.stored_hkey == Some(hasher.finish()) {
            println!("found:  {:?}", key);
            return Some(&self.stored_value);
        } else {
            println!("NOT found: {:?}", key);
            return None;
        }
    }

    /// Query and modify with key Q
    pub fn get_mut<Q>(&mut self, key: &Q) -> Option<&mut V>
    where
        Q: Hash + Eq + Debug + ?Sized,
    {
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        if self.stored_hkey == Some(hasher.finish()) {
            println!("found:  {:?}", key);
            return Some(&mut self.stored_value);
        } else {
            println!("NOT found: {:?}", key);
            return None;
        }
    }

    /// Remove a specific item, true if succesfull
    pub fn remove<Q>(&mut self, key: &Q) -> bool
    where
        Q: Hash,
    {
        let mut hasher = DefaultHasher::new();
        key.hash(&mut hasher);
        let mut destroy = false;
        if self.stored_hkey == Some(hasher.finish()) {
            destroy = true;
        }
        if destroy == true {
            self.stored_hkey = None;
            self.stored_value = <V>::default();
        }
        return destroy;
    }

    /// Fully clear cache
    pub fn clear(&mut self) {
        self.stored_hkey = None;
        self.stored_value = <V>::default();
    }

    /// Reports number of items
    pub fn len(&self) -> usize {
        if self.stored_hkey.is_some() {
            return 1;
        } else {
            return 0;
        };
    }
}

fn main() {
    // Create our web cookie cache:
    let mut c: BorrowedKeyCache<String> = BorrowedKeyCache::new();
    {
        let initial_cookie = String::from("welcome123");
        c.insert::<str>(&initial_cookie.as_str(), "Initial user session".to_string());
        println!("Set cookie in HTTP return header: {}", initial_cookie);
        // Variable 'initial_cookie' can be dropped here.
    }

    // Check if a returning visitor with same key is recognized:
    {
        let returning_visitor_cookie = String::from("welcome123");
        println!(
            "LOG: key {:?} received, content in cache {:?}",
            returning_visitor_cookie,
            c.get(&returning_visitor_cookie)
        );
        // And modify, borrowing the key again,
        *c.get_mut(&returning_visitor_cookie).unwrap() = "Second session".to_string();
    }

    {
        // A user logs out
        let logout_user_cookie = String::from("goodbye890");
        println!("LOG: key {:?} received, logging out", logout_user_cookie);
        c.remove(&logout_user_cookie);
    }

    // All keys dropped, cache still filled.
    println!("Number of items in cache {}", c.len());
    c.clear();
    println!("Reduced to {} items.", c.len());
}
al8n commented 1 year ago
  1. Yes, we can remove all borrow stuff.
  2. If we use u128, then we need some extra calculations, e.g. high 8 bits to store the result of first hash algorithm, low 8 bits to store the result from the second hash algorithm. Most of users will not need the second hash algorithm. FMO, I do not think the extra calculations will influence performance, but I have to say I am a bit lazy and just leave it (64,64).
peter-scholtens commented 1 year ago

Okay so ..

  1. Done, see https://github.com/al8n/stretto/pull/53
  2. For now I agree that this is handy. In the very far future where 128-bit processor become more common and have a built-in hash function you can still modify it and increment the major number of the revision of the package.

This leaves the original question still open: will the code be modified to have a borrowed key at the insertion function, as demonstrated in https://github.com/al8n/stretto/issues/50#issuecomment-1529481913 ? Possibly there exists a better implementation, but I didn't find it. As stretto is advertising to have no keys stored, its users should not be obliged to copy a key and let stretto destroy it (as in the current situation).