al8n / stretto

Stretto is a Rust implementation for Dgraph's ristretto (https://github.com/dgraph-io/ristretto). A high performance memory-bound Rust cache.
Apache License 2.0
412 stars 28 forks source link

Potential race condition with insert-wait-get #57

Open blind-oracle opened 1 year ago

blind-oracle commented 1 year ago

Hi folks, not sure it's a bug but I can't figure out otherwise.

We got a cache based on stretto (sync API) in our service with simple semantics:

I have a test that does simple insert/wait/get sequence to check that given entry exists in cache and in our CI/CD (bazel) this test sometimes fails - get() reports that the key is missing. Problem is that I cannot reproduce this locally - it has 100% success rate even if I run it thousands of times.

I am creating a cache with a large enough max_cost and using TTL of 3600s to make sure it won't be evicted.

Would be grateful for any hint on how to debug this, maybe I'm doing something wrong. But it seems consistent with code in https://github.com/al8n/stretto/blob/main/examples/sync_example.rs

al8n commented 1 year ago

Hi, I failed to reproduce it on my machine, but this may be because the current implementation will first push new entry to a write buffer and then add it to the map (if the write buffer is full, then some inserts will be directly dropped). Can you give me your test code to help me reproduce the problem?

blind-oracle commented 1 year ago

@al8n Thanks for the effort. Probably my code won't help as I can't reproduce it too when running locally and not in Bazel. It's hard for me to tell how these environments are different, should be the same (and we have thousands of other tests which run fine). But there should be some subtle difference that causes this...

Effectively I'm doing the simple thing that I wrote initially - insert the value with some key, then immediately (well, after wait) check if it's there. And this sometimes gives me a cache miss. When the value is inserted the cache is empty, just created, so it shouldn't be dropped.

Maybe there's some initialization phase after the Cache object is created using CacheBuilder.finalize() (threads are spawned etc)? Though it does not explain why it does not fail locally.

I've switched the cache now to use async API of stretto, will check if that will cause same issues...