Open kirillt opened 3 months ago
The kotlin monoid implementation is functionally equivalent to what modern CRDT libraries are offering. While we can go for a custom implementation it is error prone and require extra dev effort for implementing and maintaining. I can look into existing CRDT rust implementations for a feature set that suits us.
Rust has two popular libraries for CRDTs - crdts and automerge. It appears that automerge is the more popular, well-maintained and well-featured library.
Now for the actual structures I looked at the actual structures that implement monoids in the Kotlin library
Class | Structure |
---|---|
Tags | Set |
Properties | Set |
Score | Int |
So it appears that for the current structures we don't need to pull in a CRDT implementation. In fact HashSet
already implement a union function, in fact Kotlin too uses the standard library implementation for union. So only a special function for integers is needed. CRDTs can be considered later when we need to add structures that need more powerful combine semantics.
@twitu yes, if we can use a better structure we could throw Monoids away. You are right that HashSet
is a monoid itself, I need to refresh my memory about how exactly Monoids been used in Kotlin implementation.
But I remember that there was a problem with Monoid-like structures that we cannot remove data:
Can we solve this problem using CRDTs?
Btw, am I right that automerge has Rust core and JS wrapper? I just see that there are much more JS code in their repo.
Monoids cannot handle the delete case directly. There are ways to use monoids for delete by adding soft delete flags for records. That is a pretty good solution for most use cases. Monoids will be a good solution for the uses cases (HashSet/Int) that we have in the Kotlin app.
Yes automerge has a rust core and JS bindings for WASM. I guess they've strongly positioned for use with client side web apps so it has some JS to make it ergonomic to use on the client side. It's still a pretty well featured rust library we can use if needed.
My concern however is that it uses it's own data structures for Maps and Lists and the likes. So there will be a conversion overhead from the standard library BTreeMap/HashMap/HashSet to automerge's data structures so that different versions can be merged. I would put off pulling in such a complex dependencies until we really need it.
We'll be able to define a single "sync with disk" function, similarly to how it was in BaseStorage.kt, after the conflicts resolution is implemented.
Ishan's idea: no need to implement CRDT if we just do "soft-delete" instead of regular deletion. Adding second Monoid instance would help us in syncing deletions between devices.
Merged:
We also need:
merge_from
)PersistenceStrategy
:
Links for the research of CRDTs:
Basing on this article, it seems that we can use the following data types:
PN-counter
s for scoresU-set
s for tagsU-set
s for propertiesU-set
s for the storage tables themselvesIt seems that U-set
can be used for maps, not only sets, by interpreting a map as a set of entries.
@tareknaser @twitu
One concern I see with U-set
is that it won't preserve the order of insertion property of BTreeMap
s. In #63 we implemented something very similar to a last writer wins (LWW) CRDT by merging the BTreeMap values based on the most recent timestamp.
The main question to consider is how we intend to use insertion order?
Perhaps, if we can relax the requirement for maintaining insertion order we can consider using the U-set
or maybe the LWW-Element-Set
which records the timestamp for each element which can be used as an approximation for insertion order.
Port Monoid structure used during conflicts resolution. It's naive approach but it's simpler than CRDT.