al8n / stretto

Stretto is a Rust implementation for Dgraph's ristretto (https://github.com/dgraph-io/ristretto). A high performance memory-bound Rust cache.
Apache License 2.0
413 stars 28 forks source link

Getting `fail to send msg to channel: cache set buf sender: channel closed` on inserts #13

Closed sebasmagri closed 1 year ago

sebasmagri commented 2 years ago

Under some pressure, we start seeing this and the cache stops adding new entries. Any guidelines on troubleshooting this?

al8n commented 2 years ago

Hi, could you provide some example code for me to reproduce?

sebasmagri commented 2 years ago

@al8n it's hard to reproduce in a development environment also from me, but under real load it starts happening after some minutes. Tracking down the code it seems like the insert buffer channel gets closed at some point and then we start seeing this. I've been trying to reproduce this also using the benchmarking scripts in stretto, but no luck so far.

This is why I'd be interested on some guidance to troubleshoot this.

al8n commented 2 years ago

@sebasmagri which one are you using, Cache or AsyncCache?

sebasmagri commented 2 years ago

AsyncCache. I think it might be related to using wait too often. I had to do that since I started seeing some issues otherwise, but now I've removed it and I still get a very high dropped sets rate.

Il mer 13 apr 2022, 12:59 Al Liu @.***> ha scritto:

@sebasmagri https://github.com/sebasmagri which one are you using, Cache or AsyncCache?

— Reply to this email directly, view it on GitHub https://github.com/al8n/stretto/issues/13#issuecomment-1097913805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACXAJLWD6XIZNMAAMQRT3VE2SJBANCNFSM5S2D7HCQ . You are receiving this because you were mentioned.Message ID: @.***>

sebasmagri commented 2 years ago

I've also tried to track if .close is getting called explicitly at some point but didn't find any cases.

sebasmagri commented 2 years ago

For the record, the issue I get when I don't explicitly wait after inserting is that then I find the expiration map mostly already borrowed on update.

sebasmagri commented 2 years ago

Found something interesting using tokio-console, when this condition happens, I can only see the policy task in the tasks list. so could this confirm that AsyncCache::spawn is going down, potentially by receiving a stop event and exiting the loop?

al8n commented 2 years ago

Yes, it could be AsyncProcessor stops. It's weird, because the only way to stop AsyncProcessor is to call close method or the reference counter of AsyncCache goes to 0. May need to add tracing as a new feature for this crate to help find possible bugs.

sebasmagri commented 2 years ago

As I'm putting the cache instance as managed state in a Rocket application, i wonder if the second case could be possible, however in such case also the policy should be dropped right?

Il mer 13 apr 2022, 20:03 Al Liu @.***> ha scritto:

Yes, it could be AsyncProcessor stops. It's weird, because the only way to stop AsyncProcessor is to call close method or the reference counter of AsyncCache goes to 0. May need to add tracing as a new feature for this crate to help find possible bugs.

— Reply to this email directly, view it on GitHub https://github.com/al8n/stretto/issues/13#issuecomment-1098335229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACXAJBFEIJS5KKHL4TGDLVE4D5LANCNFSM5S2D7HCQ . You are receiving this because you were mentioned.Message ID: @.***>

al8n commented 2 years ago

Yes, it should be. However, it may also possible the PolicyProcessor be kept. Because in AsyncCache, the inner fields are wrapped by different Arc. It may be a bug.

sebasmagri commented 2 years ago

@al8n let me know if I can help checking somehow. We're looking forward to deploy a component to production using stretto and would like to get these issues sorted out.

al8n commented 2 years ago

Thanks! The plan is to add the tracing stuff for the crate, especially the Processors. So that we can know when the Processor stops. Would you like to do this? Or I can start on this weekend. Once I complete, I will let you know and you can test it again.

sebasmagri commented 2 years ago

@al8n any updates on this front? I had a few days out and would like to catch up with this.

al8n commented 2 years ago

Sorry, there is no update. I am busy with my uni study and my full-time work these days.

phantomhker commented 1 year ago

The ExpirationMap is RwLock<RefCell<HashMap<K,V>>>, when trying to borrow mutable reference failed, the CacheProcessor will return, and finally the channel closed. Maybe you can remove the RefCell, just use RwLock's write method instead. Further more, return statement in CacheProcessor is not reasonable when error occurred. image image @sebasmagri @al8n

al8n commented 1 year ago

Thanks @phantomhker! I am really silly at that moment for adding a RefCell when there is already a RwLock.