Queries for the sorted lists

Gozala commented 1 month ago

We need to support sorted list. Long terms it sounds like implementing RGA as query rule might be a proper way to go about it. Better yet may be we could go for fugue to minimize interleaving.

None of the above sounds straight forward, so it would be a good to come up with something simple that can be implemented in a day and replace it with something better in the future.

💡 Capturing order separately

General idea is to model element membership via two relations

Membership [list, 'array/element', first]
Order [first, 'array/next', second]

This means we could potentially have [first, 'array/next', conflict] but that is ok client could probably just sort them in a deterministic way deciding which of the conflicting array/next to descend to first.

There are some unknown however.

How do we model empty array. Note that it needs to be unique or all the first elements of all arrays will point to it and that seems undesired.
- Maybe we don't need to model empty array at all ? Although I think query engine will produce no results in such case.
- We could derive reference to itself maybe [list, 'array/element', list] feels sketchy but maybe that's fine ?
What do we do if we have same element in multiple positions in the array ?
- Pointing to the next element does not work in this case because in [a, b, a, c] we will wind up with something like [a, b, c, a, b, c] instead.

💡 Do not worry about concurrency at all

Alternatively we could choose to not worry about concurrency at all and simply capture relations as

[list, 0, first]
[list, 1, second]

With this approach we would be able to collect all elements simply by doing

[?list ?offset ?item]

And sort first by offset and then by hash to arrive at canonical order.

Tradeoff however will be that if I want to insert 2nd element in an array of 100 elements I'll need to update 99 * 2 + 1 updates.

Dealing with empty lists are still going to be a problem that needs solving however.

Gozala commented 1 month ago

One other idea I have been entertaining is to allow non-integer indexes for array elements. That way if say we have an array like

{ 0: a, 1: c }

and we want to insert b between a and c all we need to do is insert it at an average position that is (0 + 1) / 2 which will give us

{ 0: a, 0.5: b, 1: c }

There are couple of problem with this approach however

Our index attributes increasing their fractions at least when we do not append.
- Perhaps instead of incrementing index by 1 we could increment it by much greater number say 1024 that we'll have 10 slots or so before we go down into fractions.
To insert into the head of the list we need 0 < float which we can't have.
- This should be easy enough to address however if we treat 0 index as head of the list and not the index of the first item. That way we'll always have a spot to insert into
Concurrent inserts in same position will pick the same index and result in conflict ☹️
- Maybe we can introduce some bias derived from local replica to avoid this conflicts ?

P.S.: I've tried something along those lines in reflex FRP library to maintain stable IDs for the VDOM lists, although in the end there we decided to just let application to provide an id as opposed to generating one.

Gozala commented 1 month ago

I have being trying to grasp Fugue list crdt explanation and came up with a following idea

Untitled (1)

Min position is ""
Max position is "/" (For simplicity we'll assume any <"/")
We imply that every list has implied left boundary with min position ('/') and right boundary with max position.
- That way every insert is between two positions
When inserting entity with id milk (entity IDs are unique and ugly but keeping it simple here) we
1. If right.startsWith(left) then position = '${right.slice(0, -1)},${entity}/'
  - , < / so new position < right
2. Otherwise `position = '${left}${entity}/'
  - position.length > left && position.startsWith(left) so left < position.
  - left < right so position.slice(0, left.length) < right and there for position < right.

I believe this gives us attribute names that will sort in desired order. Furthermore we replaced a need for globally unique string suffixes that were derived from logical (replica, counter) clock with sufficiently unique entity identifiers.

As I understand it unique prefix was used to avoid conflicting positions between replicas specifically
1. Unique replica identifier avoids collision with other replicas
2. ⚠️ Unique counter I'm not sure. Typically it guarantees unique identifiers per replica, however here at every insertion we append to keys so I can't think of a scenario where collision would have occured so presumably replica id would have been sufficient (I could be missing something)
⚠️ It does imply however, that if two different replicas insert same entity between same pair of items we would see different results from Fugue, specifically in our case only one entity would be added while in case of Fugue two of the same items would.
- ✨ I think our behavior is probably more desired.
☢️ It is however very critical that entity identifiers do not collide
- For what it's worth this is where (replica, id) like thing could be utilized.

Gozala commented 1 month ago

One other idea I have been entertaining is to allow non-integer indexes for array elements. That way if say we have an array like
{ 0: a, 1: c }
and we want to insert b between a and c all we need to do is insert it at an average position that is (0 + 1) / 2 which will give us
{ 0: a, 0.5: b, 1: c }
There are couple of problem with this approach however

Our index attributes increasing their fractions at least when we do not append.

Perhaps instead of incrementing index by 1 we could increment it by much greater number say 1024 that we'll have 10 slots or so before we go down into fractions.

To insert into the head of the list we need 0 < float which we can't have.

This should be easy enough to address however if we treat 0 index as head of the list and not the index of the first item. That way we'll always have a spot to insert into

Concurrent inserts in same position will pick the same index and result in conflict ☹️

Maybe we can introduce some bias derived from local replica to avoid this conflicts ?

P.S.: I've tried something along those lines in reflex FRP library to maintain stable IDs for the VDOM lists, although in the end there we decided to just let application to provide an id as opposed to generating one.

Turns out Figma uses this approach and calls it Fractional Indexing

Gozala commented 1 month ago

Fugue inspired ordering seems very promising:

Entities could simply be associated with a list via derived position.
This means single range scan on KV 🤩 and then sort on the client.

There are problems however

Realistically we can not use string based approach, attributes length will get too huge
- Although we may be able to adapt this into tree based solution that fugue actually uses
Datalog query may filter out some elements of the list in which case derived insertion may come out funky
- By which I mean if we have a, b, c, d and view only shows a, d and we insert x it may end up in any position between a and d not necessarily after a e.g. it could appear as a, b, x, c, d which perhaps is not terrible but could be surprising never the less.

Gozala commented 1 month ago

Another thing on mind is whether it would make more sense to add insert / remove operations to a transactor as opposed to doing all of the CRDT logic in the client. Going that route would certainly make it easier for a client and will provide more flexibility to the backend in terms of implementation details. Furthermore it may be a solution for https://github.com/commontoolsinc/synopsys/issues/22

So in this hypothetical lens we could define relational formula e.g.

{ 
  select: { self: "?", order: "?order", todo: ["?item"] },
  where: [
    ["?", fugue.order,  "?order"],
    ["?", fugue.entries "?item"],
    ["?item", "todo/done", false]
  ],
  update: ({ self, todo, order }) => [
     self,
     "/common/ui",
     todo
        .sort((a, b) => order[a].localeCompare[order[b]])
        .map(todo => ui(Todo, todo))
  ]
}

We could also extend notion of formula to transactions e.g. we could have transactions like

transact(db, [
   // expands to { Perform: ["fugue.insert", list, { item, after }] }
   fugue.insert(list, item, {after}),
   fugue.remove(list, item)
])

commontoolsinc / synopsys

Queries for the sorted lists #20

💡 Capturing order separately

💡 Do not worry about concurrency at all