ARK-Builders / ark-core

The core of the ARK framework
MIT License
4 stars 3 forks source link

`fs-storage`: Conflicts resolution (Monoid/CRDTs) #19

Open kirillt opened 3 months ago

kirillt commented 3 months ago

Port Monoid structure used during conflicts resolution. It's naive approach but it's simpler than CRDT.

twitu commented 3 months ago

https://github.com/ARK-Builders/arklib-android/blob/main/lib/src/main/java/dev/arkbuilders/arklib/data/storage/Monoid.kt#L3

The kotlin monoid implementation is functionally equivalent to what modern CRDT libraries are offering. While we can go for a custom implementation it is error prone and require extra dev effort for implementing and maintaining. I can look into existing CRDT rust implementations for a feature set that suits us.

twitu commented 3 months ago

Rust has two popular libraries for CRDTs - crdts and automerge. It appears that automerge is the more popular, well-maintained and well-featured library.

Now for the actual structures I looked at the actual structures that implement monoids in the Kotlin library

Class Structure
Tags Set
Properties Set
Score Int

So it appears that for the current structures we don't need to pull in a CRDT implementation. In fact HashSet already implement a union function, in fact Kotlin too uses the standard library implementation for union. So only a special function for integers is needed. CRDTs can be considered later when we need to add structures that need more powerful combine semantics.

kirillt commented 3 months ago

@twitu yes, if we can use a better structure we could throw Monoids away. You are right that HashSet is a monoid itself, I need to refresh my memory about how exactly Monoids been used in Kotlin implementation.

But I remember that there was a problem with Monoid-like structures that we cannot remove data:

  1. User has phone P and laptop L connected via Syncthing, so storages are synced as plain files.
  2. Both P and L run the same app, listening the storages on their disks.
  3. User adds entry X into the app on P, it is persisted, synced to L's disk and loaded into L's instance of the app.
  4. Now, the user deletes X from the app on P, we remove it from P's disk, app on L detects the change but overwrites the storage with older version because union(S+X, S)=S+X for any storage S and entry X.

Can we solve this problem using CRDTs?

kirillt commented 3 months ago

Btw, am I right that automerge has Rust core and JS wrapper? I just see that there are much more JS code in their repo.

twitu commented 2 months ago

Monoids cannot handle the delete case directly. There are ways to use monoids for delete by adding soft delete flags for records. That is a pretty good solution for most use cases. Monoids will be a good solution for the uses cases (HashSet/Int) that we have in the Kotlin app.

Yes automerge has a rust core and JS bindings for WASM. I guess they've strongly positioned for use with client side web apps so it has some JS to make it ergonomic to use on the client side. It's still a pretty well featured rust library we can use if needed.

My concern however is that it uses it's own data structures for Maps and Lists and the likes. So there will be a conversion overhead from the standard library BTreeMap/HashMap/HashSet to automerge's data structures so that different versions can be merged. I would put off pulling in such a complex dependencies until we really need it.

kirillt commented 2 months ago

We'll be able to define a single "sync with disk" function, similarly to how it was in BaseStorage.kt, after the conflicts resolution is implemented.

kirillt commented 2 months ago

Ishan's idea: no need to implement CRDT if we just do "soft-delete" instead of regular deletion. Adding second Monoid instance would help us in syncing deletions between devices.

kirillt commented 1 month ago

Merged:

We also need:

kirillt commented 6 days ago

Links for the research of CRDTs:

kirillt commented 6 days ago

Basing on this article, it seems that we can use the following data types:

It seems that U-set can be used for maps, not only sets, by interpreting a map as a set of entries.

@tareknaser @twitu

twitu commented 18 hours ago

One concern I see with U-set is that it won't preserve the order of insertion property of BTreeMaps. In #63 we implemented something very similar to a last writer wins (LWW) CRDT by merging the BTreeMap values based on the most recent timestamp.

The main question to consider is how we intend to use insertion order?

Perhaps, if we can relax the requirement for maintaining insertion order we can consider using the U-set or maybe the LWW-Element-Set which records the timestamp for each element which can be used as an approximation for insertion order.