DiceDB / dice

DiceDB is an in-memory real-time database with SQL-based reactivity. It is hyper-optimized for building and scaling truly real-time applications on modern hardware while being a drop-in replacement for Redis.
https://dicedb.io/
Other
4.46k stars 548 forks source link

Run expiraion in a separete periodic Goroutuine #31

Closed arpitbbhayani closed 1 week ago

arpitbbhayani commented 1 year ago

Is your feature request related to a problem? Please describe. The expiration runs in the event loop unnecessarily starving the core execution.

Describe the solution you'd like Let there be a goroutine that periodically wakes up and deletes the expired keys. We need to take care of a close-edge case though. What if a key is about to

Naive solution: lock and execute Con: unnecessarily the entire hashtable is locked, giving us potentially no benefit.

Possible solution: park the expired value in a different hashmap (trash can) for some time, if it is accessed within n seconds, well and good. otherwise, delete it. This two-stage deletion will ensure we are minimizing (not eradicating) the edge case. In the trash can, each element has a separate ticker.

Advantage: no need to lock the hash table (key store)

Additional context This approach is inspired by Garbage Collectors.

rohanverma94 commented 1 year ago

Premise I'm not sure about the current storage engine. And owing to that, some enhancement would do O(k²) operation or Θ(m²) in additional space, I agree that we can't get away from baseline complexities. The idea of having map[string]*object is great for starting out and then gradually evolving the data structure as we realize the cost and premium to be paid for carrying tech debt along the course of the development.

Synopsis Expiration means a bulk delete operation. This is supposed to delete some entries and internally a map with 10,000 entries isn't getting any smaller in size after deleting those entries. The fact being deleting the values from the map will allow the memory referenced by the obj pointers to be collected as well. This is pretty much automatic GC in action. Let's look at GC -

From golang GC documentation

“The GC runs concurrently with mutator threads, is type accurate(also known as precise). allows multiple GC threads to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is non-generational and non-compacting. Allocation is done using size segregated per P allocation areas to minimize fragmentation while eliminating locks in the common case.“

The above says "non-compacting" so GC is not going to help my Map. So all those "holes" or "gaps" in our favorite map are not going away. The only solution viable solution seems to copy the entries from the old map to the new map such that any future operation would happily navigate inside maps with predictable complexities.

Proposed Twin Expiry Map approach The "lock n load" operation can be handled atomically in a lock-free way. Though it is not a silver bullet , since you're expected to level up on your resource usage(as you've suggested with 2 maps). I spent months writing lock-free data structures for the network stack where turnaround time is required to be minimal. And this realm of 'lock-free' really doesn't have any advantage unless the hardware is readily available to resume execution without any break, plus contention for resources should be balanced or you're just wasting computation which could be discarded ( waiting for n seconds and reverting the operation). It's hard to design a lock-free construct that is both correct and "fair".

As per my synopsis, there is considerable overhead & bookkeeping to maintain good and dirty pages (maps) whether or not you apply a lock-free approach. So I would propose an overhaul of the current storage engine with TTL, timestamp, and "last accessed at" all implemented in a singular dedicated "PageMap" data structure along with Key-values pair. This PageMap should not be based on Map but slices, as it defers the GC action and gives sufficient time for book-keeping of the data pages, which could save computation. And after GC kicks-in then we'll probably do some "compacting" variation of snapshotting or AOF if any "holes" or "gaps" are left in any auxiliary maps which we would use to maintain the isolation of slices belonging to different keyspaces.

Please let me know your input on this.

MAKE Golang GREAT AGAIN image

VipinRaiP commented 1 month ago

Is this issue still relevant?