Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
176 stars 8 forks source link

Addition of key_may_exist() method #52

Closed wbarnha closed 1 year ago

wbarnha commented 1 year ago

Thanks for the help with our migration! Because of your changes, I was able to get most of the tests in https://github.com/faust-streaming/faust/pull/478 working smoothly. We almost have all of rocksdict ready to replace python-rocksdb! There is one method remaining to mirror from https://faust-streaming.github.io/python-rocksdb/api/database.html that I particularly need:

key_may_exist(key, fetch=False, verify_checksums=False, fill_cache=True, snapshot=None, read_tier='all')

If the key definitely does not exist in the database, then this method returns False, else True. If the caller wants to obtain value when the key is found in memory, fetch should be set to True. This check is potentially lighter-weight than invoking DB::get(). One way to make this lighter weight is to avoid doing any IOs.

Parameters:

        key (bytes) – Key to check

        fetch (bool) – Obtain also the value if found

For the other params see [rocksdb.DB.get()](https://faust-streaming.github.io/python-rocksdb/api/database.html#rocksdb.DB.get)

Returns:

        (True, None) if key is found but value not in memory

        (True, None) if key is found and fetch=False

        (True, <data>) if key is found and value in memory and fetch=True

        (False, None) if key is not found

I'll try to take a stab at adding it myself, but I'm unsure where to begin currently.

Congyuwang commented 1 year ago

Hi

Currently the in operator, or __contains__ method first calls key_may_exist and then calls get, which is similar to key_may_exist, but cannot return the value fetched or control whether to fetch. Hmm. Looks like a nice Api to me. Let me see how to add it.

https://github.com/Congyuwang/RocksDict/blob/3974e432198510f4cbfcd83de8e12b858aea8ff9/src/rdict.rs#L447-L475

Congyuwang commented 1 year ago

rust-rocksdb has not yet implemented returning value from key_may_exists. To get the value, one needs to pass a mutable pointer to the C API.

https://github.com/rust-rocksdb/rust-rocksdb/blob/6e19f1da84633ba42b69fdfc7e74b72d19f92901/src/db.rs#L1250-L1273

Congyuwang commented 1 year ago

https://github.com/rust-rocksdb/rust-rocksdb/pull/759

Congyuwang commented 1 year ago

Implemented in #55

I think this is not very accurate:

Returns:

        (True, None) if key is found but value not in memory

        (True, None) if key is found and fetch=False

        (True, <data>) if key is found and value in memory and fetch=True

        (False, None) if key is not found

I think the API works the following way:

And flip it around:

Congyuwang commented 1 year ago

I’ve also added read/write options for put, get, delete, and delete_range.

wbarnha commented 1 year ago

Thank you so much, I'm so excited to use this project with Faust! There are a few remaining differences (like iterkeys, iteritems, seek_to_first), but I've managed to work around those.