ddata durable store may grow forever

andyfurnival commented 3 years ago

Not sure if this is best served as an issue or feature request (or I'm thinking about this wrong)

I've been looking at how I can clean up keys from ddata, using durable store, so that over time I'm not loading data into memory needlessly. I've got 20k ORSetKey's (with 1-20 elements in that key, but its not a lot of data) , with a lmdb file that consumes 43mb, but in dotnet memory, it will use 118mb of memory.

When I delete a key, its not removed from the lmdb file, it looks like we mark a key as deleted, which becomes an immutable state for that key, forever.

Certain prolonged use of ddata with durable store, will exceed the original map-size if not set massively overprovisioned(you'll get an error MDB_MAP_FULL: Environment mapsize limit reached), but basically it will grow indefinitely, in some scenarios.

I think using ddata with durable store may still need to support removing deleted entries from the store, after a period of time, so that we don't grow indefinitely (even by accident). LightningDB .Net client supports deleting, but I'm not sure why its not a capability used for using lmdb as a durable store. see https://github.com/CoreyKaylor/Lightning.NET/blob/dbe13f10ee11a21e5c6986a33102dd3288e548c8/src/LightningDB/LightningTransaction.cs#L212

Looking at the docs (as well as the Scala/Java version of Akka), the spec seems to be the same, so at some point someone decided deletes were acceptable states of Keys in ddata, but bad to delete up data in the lmdb store, but no real rationale as to why (that I can find)

The documented approach to avoid growth over time, that would affect both memory and storage capacity over time, is choosing the right type of keys (ORKeyMap for example). As I see it, through evolution of any platform, we could be burdened with old data we just can't get rid of, without taking a nuclear option.

My take on this, is if there are a number of keys that are in a deleted state, what would be the harm in some sort of house keeping, to clean up the underlying lmdb store, so at some point they don't even get loaded into the system (prune deleted keys from lmdb). I see no real reason why we wouldn't have that or whether this is an oversight of the original design of Lightbend's Akka. I've seen projects use lmdb that perform delete operations, so it can't be a limitation of the technology or a neccesarily bad practice in general

Horusiath commented 3 years ago

The reason behind advocating for ORMap is that it internally has a separate data structure that tracks concurrent updates and removals - it's very lightweight and its size doesn't depend on user-defined keys or values - it depends on the number of members recorded in cluster history instead.

This is not the case for top-level entries (the ones that you access via eg. ORSetKey<T>). Each top level entry is independent from others. Each of these entries stores internal sequencer that is used to recognize which updates were sent to other replicas and which were not. If deletion was permanent this sequencer would be reset as well, potentially causing state corruption during data replication, eg.:

Node A creates ORSet<string>("A") and adds elements "a" and "b" to it: internally they receive sequence numbers A1 and A2. These sequence numbers are used by A to communicate to other nodes about the updates, they haven't seen yet.
Now imagine that A removes entire set "A" (with deep deletion as as you suggested). From A perspective it looks like "A" never existed.
Node A creates ORSet<string>("A") again - it's possible since we have no information that this set has ever existed.
Node A adds element "c" to the set (since we didn't know about previous incarnation of "A", this update will receive sequence number A1).
Steps 2-4 happened before A managed to replicate its state to other nodes. Now these nodes are asking about new updates - they know that they already seen updates with seqNr. A1 and A2 - therefore "c" (with segNr. A1) will never be send to them as they think, they already seen it.

In result we have corrupted state in the cluster: Node A has "A" = ["c"], while other nodes: "A" = ["a","b"] (which is outdated).

For this reason if you have some map-like structure with frequent churn of inserted/removed elements, don't model them using top-level entries, but instead prepare dedicated collection (using eg. ORMap) and add/remove elements to it.

andyfurnival commented 3 years ago

I appreciate that not using ORSet for dynamically changing data isn't a good practice (ORSet probably needs a warning in the docs about unbounded growth if used to add/remove frequently), but data remaining around forever puts us into a situation where software and data design must be free of mistakes, and many people could fall into a trap here of using the wrong implementation (I speak from experience). Changing requirements can't be catered for upfront, so using ORSet for a few values early on in a software system may seem ok, but may soon become bad as requirements evolve we need to handle data differently, but we retain references to legacy data forever, long after features have changed or have been retired.

Getting into the version control of data, and how each node understands what is available to maintain consistency, I'm not advocating that such a capability should be freely available via an API, as it probably would do more harm than good. However, a key that never gets updated as its deleted as far as the system goes, really has no business remaining within the system, and consuming resources. If its not going to be updated, then surely purging it saves processing that data over and over again until the end of time (or someone decides, that its come a point that we need the nuclear option to delete the data files and start again).

Some interesting reading I found on this, as I was trying to work out how one would perform such a clean up http://composition.al/CMPS290S-2018-09/2018/11/12/implementing-a-garbage-collected-graph-crdt-part-1-of-2.html http://composition.al/CMPS290S-2018-09/2018/12/08/implementing-a-garbage-collected-graph-crdt-part-2-of-2.html

Horusiath commented 3 years ago

I'm not advocating, that this design decision was good either, however it is how it is. Top-level entries are for fixed sets of keys - think of them like routes in your web service - not for random unbounded strings comming from client.

Aaronontheweb commented 3 years ago

IMHO, there should be a mechanism for deleting data forever. If the data can be "forgotten" by removing a key when not using durable ddata, why should the durable version be any different?

Regardless of the distributed systems theory, this is the bottom line:

As I see it, through evolution of any platform, we could be burdened with old data we just can't get rid of, without taking a nuclear option.

It's inevitable that more users will have a need to remove this type of data from their system, after the useful lifetime of the data has passed.

One concept that's been proposed, but not implemented, in the JVM is setting a TTL (time to live) for CRDTs: https://github.com/akka/akka/issues/27683

Any CRDTs that go untouched after the TTL period are purged from memory and disk.

An example: if you're using CRDTs to collect real-time order book data for a given trading day, you don't need to retain that CRDT longer than 1-2 days following the previous trading day as the real-time flows will have moved to a new order book entity. The old entity can be safely disregarded.

I'd probably implement a soft + hard delete system when durable DData is enabled:

soft delete TTL (removed from memory, but not disk)
hard delete TTL (longer than soft delete, cleaned up from disk)

TTLs are automatically reset (sliding window timeout) each time an entity is touched via a new subscription, update, or query.

The TTL would need to rely on a hard deletion mechanism, which could also be exposed directly to end-users in the form of an explicit command. We could implement the latter first and introduce a TTL scheme later.

Would that work for you @andyfurnival ? Any thoughts on that @Horusiath ?

Aaronontheweb commented 3 years ago

Keeping track of hard delete TTLs would require a separate "system CRDT" to track all of those, btw

Aaronontheweb commented 3 years ago

Also, I'd limit the scope of the deletion behavior to keys - not individual entries inside more complex CRDTs such as ORMaps.

Aaronontheweb commented 3 years ago

TTLs are automatically reset (sliding window timeout) each time an entity is touched via a new subscription, update, or query.

Also: that would not be a trivial change.

andyfurnival commented 1 year ago

Akka have implemented this now https://github.com/akka/akka/issues/27683, on looking into the change, they have a single mechanism for expiring, using a timestamp to track frequency of access, which can be set with TTL in HOCON. I think the proposal above would be more robust than the Akka's solution (which seems could be clunky if we had lots of different keys), having hard and soft TTLs, but I think the reference implementation is helpful.

Aaronontheweb commented 1 year ago

@andyfurnival that's great to know - I think we could certainly implement something similar. I'm starting to see more users using DData in the wild (i.e. rather than simply using it indirectly via Akka.Cluster.Sharding) so this will need to be addressed in the near future.

akkadotnet / akka.net

ddata durable store may grow forever #5103