facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
27.83k stars 6.2k forks source link

Feature request: "Multi" prefix extractor support #12824

Open zaidoon1 opened 3 days ago

zaidoon1 commented 3 days ago

say my key format is <account_id>:<user_id>:<some dynamic value>

today, we can create a prefix extractor/bloom on : to help with queries that start with some known <account_id>:<user_id>, HOWEVER, what we can't do today is ALSO setup a prefix extractor on <account_id> this way, I can use bloom filters on queries that happen to know the account id + user id combination as well as the queries that only happen to have an account id. Effectively, in db/sql terminology, this is like being able to create multiple indexes on the "columns" to optimize queries like: select * from blah where account_id = 123 & select * from blah where account_id = 345 and user_id = 678

As far as I know, today we can only have one prefix extractor/bloom per cf so we have the following workarounds which are not ideal:

  1. create another cf that duplicates the data, so that one cf has <account_id>:<user_id> prefix extractor and the other has <account_id> prefix extractor and depending on the query/what we already know, we will lookup the kv from the corresponding cf. The issue here is we need to use more disk space to store the duplicate data

  2. Given <account_id> is common between both prefix extractors (in this use case) and we always have this, we use this as the prefix extractor, however, we miss on the opportunity to optimize queries that also have <user_id>

zaidoon1 commented 2 days ago

looks like something similar was requested https://groups.google.com/g/rocksdb/c/bb6Db8Y3xwU

zaidoon1 commented 2 days ago

@ajkr What do you think about a feature like this? It seems like it's very useful/high impact, but i'm not sure the level of effort is?