apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.47k stars 450 forks source link

Using prefix bloom filter in range queries #1977

Open chrisxu333 opened 8 months ago

chrisxu333 commented 8 months ago

Search before asking

Motivation

Prefix seek / prefix bloom filter is proven to be useful when performing range queries of a collection of keys sharing the same prefix. Refer to [https://github.com/facebook/rocksdb/wiki/Prefix-Seek] for more detailed usage. I saw there're some APIs inside storage layer that performs range query logic. They could get benefits from using aforementioned prefix bloom filter.

Solution

I haven't summarized all potential use case of prefix bloom filter, however, FindKeyRangeWithPrefix could definitely benefit from it by configuring the prefix_extractor.

Are you willing to submit a PR?

chrisxu333 commented 8 months ago

@git-hulk could you help to see if this proposal is needed / necessary? Thank you :)

PragmaTwice commented 8 months ago

I believe we are willing to make such improvements if there are benchmarks that can demonstrate performance enhancements.

mapleFU commented 8 months ago

Perhaps, I don't know. IMO, Prefix filter is good for range scan, however, Range in kvrocks is not so widely used...

git-hulk commented 8 months ago

@chrisxu333 I ever tried to add the prefix bloom filter but seems didn't bring any improvements to Kvrocks, see https://github.com/apache/kvrocks/pull/1367. But as @PragmaTwice said, don't hesitate to have a try if you feel it can improve.

chrisxu333 commented 8 months ago

@chrisxu333 I ever tried to add the prefix bloom filter but seems didn't bring any improvements to Kvrocks, see #1367. But as @PragmaTwice said, don't hesitate to have a try if you feel it can improve.

Sounds good I'll give it a try.

chrisxu333 commented 8 months ago

Perhaps, I don't know. IMO, Prefix filter is good for range scan, however, Range in kvrocks is not so widely used...

Is that because of the lacking usage scenario or due to the missing support of such command?

git-hulk commented 8 months ago

What @mapleFU means is we only used range query in a few commands like hgetall/lrange/zrange, etc...

mapleFU commented 8 months ago

https://github.com/tikv/tikv/blob/2472fd4d85c220b74dc889b493e70bc95dcc75c2/src/config/mod.rs#L963

TiKV uses prefix bf to remove the timestamp column. I think if we'd like to use prefix bf, maybe we need a well-defined rule for prefix. Little commands uses Scan in RocksDB, and mosts commands just uses "Put" and "Get" directly.

However, if benchmark shows it can improve our performance, any contribution is wellcome