Open mapleFU opened 9 months ago
I'm trying to implement this. During my impl, I found a problem that:
The command 1-3 should be done in one snapshot, otherwise, the case might be:
So, GetMetadata and traversing the set should be done in one rocksdb snapshot. Would you think this is safe and ok? @git-hulk @PragmaTwice
I'm wondering if it'd be better to sacrifice the scope of the candidate set to simplify the implementation like below:
if size > N, random among the first N elements if the cached random point is empty. Otherwise, seek from the cached random point and then random among the next N elements, then cache the random point again or remove the cache random point if it reaches the end.
I thought of using this previously, and this is named "statistics" in this issue. but where should we storing this and how do updating this?
but where should we storing this and how do updating this?
We're now using a LRU to cache the iterator key while scanning the DB/hash/set, maybe we can do this in the same way.
That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...
That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...
Yes, it should be more complex than the current implementation. Then I think the implementation with the same snapshot is good.
Yeah, another point is that support get meta by snapshot can enhance our isolation. It might slightly affect LSM-Tree Garbage collection, but I think it's ok
@git-hulk should we sample at most 60 keys in spop?
@mapleFU For Kvrocks, I feel good to only randomize part of them if there're too many keys.
Why do we still need a randomness guarantee for this?
Search before asking
Motivation
Random sample keys is supported in kvrocks. When running commands like
spop
, scanning all data would be a bottleneck. Currently, the implementation is:Unlike redis, it's complexity is always O(N), N is the element cardinal, which is extremly high. It's a bit hard to get "random" in kvrocks because we're based on rocksdb.
Solution
There're some rules we can apply:
So, instead, maybe we can enhance the implementation of current impl. For example:
The complexity is still O(N), but the performance might be enhanced.
Are you willing to submit a PR?