Open gloomy-7164 opened 6 months ago
Hi
I encountered this issue a while back:
https://github.com/Snapchat/KeyDB/issues/748
Workaround can be found in the link above to keep the eviction routine in the microseconds range while keeping the CPU usage tamed.
Hi
I encountered this issue a while back:
748
Workaround can be found in the link above to keep the eviction routine in the microseconds range while keeping the CPU usage tamed.
Thanks Keith. I think our issues were different, as in my case there were no expires involved, but only the maxmemory related evictions. And my flamegraph showed very little overhead with rocksdb but mostly in the performEvictions logic itself.
(Note that this flamegraph was obtained with allkeys-lru
not allkeys-random
mentioned in the original issue )
FYI I made a simple change here to re-enable the early exit in evictions, and the blocking evictions were almost gone.
if ( /* g_pserver->m_pstorageFactory == nullptr && COMMENT OUT THE STORAGE CHECK*/ elapsedUs(evictionTimer) > eviction_time_limit_us) {
Keydb was responsive during evictions, max latency dropped from 1000+ms to ~40 ms.
Hi @gloomy-7164
Ah, I just realised your issue is in evict.cpp, while mine was in expire.cpp, apologies for not picking that up earlier! I have not experienced any issues on evictions because I have not stress tested maxmemory yet (that test is coming up, currently stress testing replication). Your patch looks handy, so will also try it when I get to that stage.
Describe the bug
KeyDB hang and was not responsive for a few seconds when it enabled flash and memory utilization grew to reach
maxmemory
.It hang for a few seconds, client read/write were almost blocked and saw 1000+ ms latency. KeyDB flushed multi hundreds of MBs of data at once.
maxmemory_policy = allkeys_random
I was testing keydb 6.3.3, on AWS r7gd.xlarge instance, 1 primary 0 replica.
To reproduce
flash
enabled.redis-cli
commands took seconds to return, client throughput dropped to almost zero.Expected behavior Evictions should happen real-time and incrementally and keydb should not spend much time evicting hundreds of MBs of data in a blocking way.
Additional information
I think the problem is from 2 places in the code:
maxmemory=10G
, keydb will evict 500MB of data all in once which takes 5+ seconds to finish. @JohnSully Any reasons why we want to evict aggressively if storage is enabled? If evictions only involves removing data from memory and not deleting them from rocksdb, I think it should be ok to evict in the same way as non-storage mode?maxmemory-eviction-tenacity
config. However this exit-early mechanism is disabled when storage is enabled, see code. What will be bad if we chose to exit evictions earlier and have storage enabled?