facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.13k stars 6.26k forks source link

Support key level TTL feature #12848

Open lhsoft opened 1 month ago

lhsoft commented 1 month ago

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

support key level ttl by add following method

DB::PUT(key, value, ttl)

ttl key will be cleaned by compaction

Actual behavior

Steps to reproduce the behavior

zaidoon1 commented 1 month ago

does https://github.com/facebook/rocksdb/wiki/Time-to-Live not work for you?

NVM, you want different ttls for different kvs.

zaidoon1 commented 1 month ago

Right now, we store the ttl as part of the kv and then using a custom compaction filter, check if the TTL has expired and delete the kv.

lhsoft commented 1 month ago

Right now, we store the ttl as part of the kv and then using a custom compaction filter, check if the TTL has expired and delete the kv.

we also use the same method. but for the blob, the compaction filter will read value and cause the read amplification. It's better to encode the ttl in blob index which isn't supported.

jowlyzhang commented 1 month ago

Hello, RocksDB recently developed a useful feature for which tracking data's write time is one of its foundations. It's this option: https://github.com/facebook/rocksdb/blob/6870cc1187c12458b3c5b4d2ba2f4ac22d5b0049/include/rocksdb/advanced_options.h#L875

Although its status is experimental, but the feature is production ready. When this option is set, RocksDB will keep track of info that can help translates a data entry's sequence number to its write time. Some caveats of this tracking that I can think of now: 1) data older than the window are all considered indefinitely old. You can set the window to be bigger than the longest ttl you want. 2) the write time tracking may not be as precise as you want. You can think of it as we only track 1000 pairs of sequence to time mapping, so if you set the window to 1000 days, all data written within one day have the same write time.

Adding on top of this, if we also pass the data entry's write time in the compaction filter, that would require yet another filter API, would it help implement the per key ttl thing you described?