apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.6k stars 471 forks source link

Enhance the implementation of Random Sample Keys in kvrocks #2144

Open mapleFU opened 9 months ago

mapleFU commented 9 months ago

Search before asking

Motivation

Random sample keys is supported in kvrocks. When running commands like spop, scanning all data would be a bottleneck. Currently, the implementation is:

  1. Get all sampled values
  2. Random filtering K values
  3. return

Unlike redis, it's complexity is always O(N), N is the element cardinal, which is extremly high. It's a bit hard to get "random" in kvrocks because we're based on rocksdb.

Solution

There're some rules we can apply:

  1. Maintaining a eq-depth histogram if random is frequently. This is the best one if random sample is frequently. However, we'll suffer from maintaining cost during write. TiKV using size based sampling to maintain the split range

So, instead, maybe we can enhance the implementation of current impl. For example:

  1. Counting all "indices" of random values, and sorting them
  2. Using iterator to get only these indices

The complexity is still O(N), but the performance might be enhanced.

Are you willing to submit a PR?

mapleFU commented 8 months ago

I'm trying to implement this. During my impl, I found a problem that:

  1. Get random set size
  2. Get all indices, counting the random fetching indices
  3. Traverse the set

The command 1-3 should be done in one snapshot, otherwise, the case might be:

So, GetMetadata and traversing the set should be done in one rocksdb snapshot. Would you think this is safe and ok? @git-hulk @PragmaTwice

git-hulk commented 8 months ago

I'm wondering if it'd be better to sacrifice the scope of the candidate set to simplify the implementation like below:

  1. if size <= N(e.g. 128/256), then iterate all members and randomly pick one of them.
  2. if size > N, random among the first N elements if the cached random point is empty. Otherwise, seek from the cached random point and then random among the next N elements, then cache the random point again or remove the cache random point if it reaches the end.
mapleFU commented 8 months ago

if size > N, random among the first N elements if the cached random point is empty. Otherwise, seek from the cached random point and then random among the next N elements, then cache the random point again or remove the cache random point if it reaches the end.

I thought of using this previously, and this is named "statistics" in this issue. but where should we storing this and how do updating this?

git-hulk commented 8 months ago

but where should we storing this and how do updating this?

We're now using a LRU to cache the iterator key while scanning the DB/hash/set, maybe we can do this in the same way.

mapleFU commented 8 months ago

That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...

git-hulk commented 8 months ago

That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...

Yes, it should be more complex than the current implementation. Then I think the implementation with the same snapshot is good.

mapleFU commented 8 months ago

Yeah, another point is that support get meta by snapshot can enhance our isolation. It might slightly affect LSM-Tree Garbage collection, but I think it's ok

mapleFU commented 8 months ago

Aha: https://github.com/apache/kvrocks/commit/95301c5a673af83ad85e6dfb63da4db41d83afb8#diff-4bae2c37c513c915c0528134abd47edff2a52833806777feb088b6d09990a74e

@git-hulk should we sample at most 60 keys in spop?

git-hulk commented 8 months ago

@mapleFU For Kvrocks, I feel good to only randomize part of them if there're too many keys.

sergerus commented 1 week ago

Why do we still need a randomness guarantee for this?