Interfaces RFC: Lists (PlainList, CrdtList)

mweidner037 commented 3 years ago

Interfaces

https://github.com/composablesys/compoventuals/blob/master/client/src/crdt/list/interfaces.ts

Questions

As with Maps, should events include values? Currently they just include the index; for Insert, you can then get the value at that index by calling get, although this is not true for Delete.
What should delete (& deleteRange) return, in either interface? Currently it returns the deleted values, like JS Array.splice. But this may not always make sense; e.g., in a Yjs-style CrdtList, deleted value Crdts can no longer be used and should be forgotten. It is also inconsistent with Set/Map, which return a boolean.
Should PlainList.set(index, value) be included? We don't want to support mutable entries for e.g. text (it's less efficient), but set is a natural list/array operation. I've left it in for now since non-supporting implementations can implement it as delete(index) followed by insert(index, value).
Similarly, should move be included in PlainList and CrdtList, or in a separate sub-interface? I've included it for now since a PlainList can always choose to implement it using a delete + insert, and all of my planned CrdtList implementations are movable. But I am not sure about those decisions.
length vs size. length is consistent with arrays, but size is consistent with the other collections. (E.g., in Java, List has a size() method, even though Java arrays use length.)
Should we include convenience methods similar to JS Array? JS Array has a bunch (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array). In principle there is little cost to adding them, since we can define AbstractList classes that implement all of the ones that are just aliases for other methods, and then our implementations can just extend AbstractList and not worry about them; but that doesn't always work because sometimes you need a different superclass, and too many methods can make the API look bloated. The methods I am most tempted by are:
- push (insert at the end)
- slice (copy a portion into an array) - note that for copying the whole thing into an array, you can already do [...yourList]
- forEach (for consistency with Set and Map)
- splice (a funky combination of bulk delete/insert that JS people seem to like)
- pop (delete at the end)
Should there be moveRange methods in addition to move? That is useful for e.g. drag-and-drop actions with multiple selections, but I haven't thought about whether it is implementable.

Planned implementations

These are essentially the same as for sets. For each list, though, we can also ask whether it should be movable, not movable, or both (provide both implementations). Movable lists would use Martin's movable list semantics: if a value / valueCrdt is moved multiple times concurrently, it picks only one of the destinations. Generally, making lists movable increases memory usage, even if you don't use move operations, since you have to add an LwwRegister for each value that controls its location.

PlainList

PrimitivePlainList: a standard list Crdt, supporting only insert & delete. This will be the most efficient, appropriate for plain text editing. Not movable.
- We could use the name SequentialPlainList instead of PrimitivePlainList, since it is analogous to the other Sequential collections. However, I worry that will make users afraid to use it, since the other Sequential collections are dangerous; it just turns out that the list version is not.
RegisterPlainList: supports set operations by using a list of registers. So if multiple replicas set a location concurrently, it will pick one, instead of inserting both. Movable by default, since you are already adding a register for each value, so adding another one isn't too much overhead proportionally?
LwwPlainList: RegisterPlainList with LwwRegisters.
GPlainList: same as PrimitivePlainList, but deletes throw an error.

CrdtList

Same possibilities as CrdtSet. In fact, Martin's construction can be interpreted as generically making a movable CrdtList out of any CrdtSet, which is how I've implemented a draft MovableList (https://github.com/composablesys/compoventuals/blob/master/client/src/crdt/list/movable_list.ts).

We could offer non-movable versions if we think users might want them as an optimization (this gets rid of the LwwRegister-per-value-Crdt memory overhead, and may shorten network messages slightly). It seems less important than for PrimitivePlainList, though, since the overhead is proportionally smaller, given that the values are themselves Crdts instead of just single characters.

Riak-style CrdtLists (movable or not) pick up an extra con: every message from a value Crdt must include that value Crdt's sequence identifier, and those identifiers can become long in long lists. This is because another replica receiving the message might have GC'd the target Crdt, and then when it resurrects the Crdt, it needs to know where to put the Crdt in the list. Also, getting this to work for a movable Riak-style CrdtList seems a bit tricky implementation-wise, but I think it should be possible and no less efficient than the non-movable version, except for the usual memory cost.

mweidner037 commented 3 years ago

Implementation idea: append (push)-only version, e.g., for chat messages. Would just be an optimization - you can of course use any CList in this fashion.

mweidner037 commented 3 years ago

Although the list interface doesn't specify them, it would be nice if each implementation had "range" (bulk) versions of the insertion methods (insert, push, unshift, splice). E.g.:

/**
   * Insert values starting at the given index.
   * Afterwards, values[0] will be at startIndex, values[1]
   * will be at startIndex + 1, etc.
   */
  insertRange(startIndex: number, ...values: T[]): this;

The interface leaves these out because they properly should have different signatures on different implementations (...values vs ...args[] vs count).

mweidner037 commented 3 years ago

Draft Move interface; a Set version is todo:

export interface ListMoveEvent extends CrdtEvent {
  // TODO: make clear exactly how to simulate the op
  // locally.  I.e., give equivalent sequential ops
  // (insert, delete).  Might be clearest if the args
  // do end up being the same as the original args on the
  // sending replica.
  /**
   * The index where the moved element started
   * on this replica.  In general, except on the
   * sending replica, this can be different from fromIndex.
   */
  startIndex: number;
  /**
   * The index where the moved element ended (is currently).
   * Might not equal toIndex even on the sender,
   * in case fromIndex < toIndex (then it is toIndex - 1).
   * TODO: during a bulk move, can we guarantee this index
   * is the final destination, or will it only hold until
   * the next entry is moved?
   */
  endIndex: number;
  count: number;
}

export interface MovableCListEventsRecord<T> extends CListEventsRecord<T> {
  /**
   * Emitted when a move operation
   * moves a value or range of values.
   */
  Move: ListMoveEvent;
}

export interface MovableCList<
  T,
  InsertArgs extends any[],
  Events extends CListEventsRecord<T> = CListEventsRecord<T>
> extends CList<T, InsertArgs, Events> {
  /**
   * Move count values starting at fromIndex to toIndex.
   * Concurrent moves of the same value will move the
   * value to a single destination chosen
   * by some arbitration rule.
   *
   * toIndex is evaluated before removing fromIndex
   * and follows the same rules as insertion indices.
   * So the element will end up at toIndex - 1 if fromIndex < toIndex,
   * else toIndex.
   *
   * count defaults to one (move a single element).
   */
  move(fromIndex: number, toIndex: number, count?: number): void;
}

// TODO: analogous SettableCList interface + event, if
// set appears as an operation in more than one implementation?

mweidner037 commented 3 years ago

Done by https://github.com/composablesys/compoventuals/pull/138

composablesys / collabs