Level / abstract-level

Abstract class for a lexicographically sorted key-value database.
MIT License
123 stars 8 forks source link

clear with limit #38

Closed ronag closed 1 year ago

ronag commented 2 years ago

I'm working on some bindings and run into some trouble with implementing clear with limit.

Currently I'm able to implement all write operations synchronously by just batching the mutations in memory and applying them during reads and then asynchronously flushing them to disk. This is great and allows me to guarantee synchronous and immediate read after write.

However, it's not possible to implement this for clear with limit as the limit parameter forces me to inspect the current state of the db, i.e. it's not possible to lazily apply the operation during reads.

My question here is what is the use of db.clear with limit? Can we make it optional and make bindings that do not implement still "valid" bindings. I guess this would mostly require some updates to the test suite so that bindings can opt out and still pass the tests?

vweevers commented 2 years ago

applying them during reads

Can you please elaborate?

My question here is what is the use of db.clear with limit?

It was originally added for symmetry with db.iterator({ limit }). An actual use case is a background job that cleans up data (e.g. to rebuild an index, delete expired entries, etc) in a big database, and wanting to break that up into shorter tasks. I suppose you could also use limit to delete the first or last N entries, but I haven't seen such a use case.

ronag commented 2 years ago

applying them during reads

So basically when you query the database you get the "old" results and then modify those according to not yet commited changes, e.g. if you have a range delete in your queue and then do a query, you would check every element in the query results against the delete range and skip them.

ronag commented 2 years ago

An actual use case is a background job that cleans up data (e.g. to rebuild an index, delete expired entries, etc) in a big database, and wanting to break that up into shorter tasks. I suppose you could also use limit to delete the first or last N entries, but I haven't seen such a use case.

I see. That makes sense. Though I guess you could also achieve that by being smart with your keys and e.g. including the timestamp as a key prefix etc... if you want maximum performance.

ronag commented 2 years ago

Rocks has some experimental feature towards that https://github.com/facebook/rocksdb/wiki/User-defined-Timestamp-%28Experimental%29.

ronag commented 2 years ago

and wanting to break that up into shorter tasks

Though with a range delete, breaking it up into shorter tasks isn't necessarily necessary since the operation is cheap (depending on the implementation ofc).

http://rocksdb.org/blog/2018/11/21/delete-range.html

vweevers commented 1 year ago

The solution is simple: if a limit option is present, use a regular iterator, else use RocksDB's DeleteRange.