Chat-Wane / LSEQTree

A data structure for distributed arrays using the LSeq allocation strategy
http://chat-wane.github.io/LSEQTree
MIT License
55 stars 10 forks source link

Does this library use Tombstones? #15

Open sunny-b opened 6 years ago

sunny-b commented 6 years ago

I realize this project is over 2 years old, but I'm implementing a LSEQ tree style CRDT for a collaborative text editor I'm building for fun. I've been doing research on CRDT's. Through my research, I came across this repo.

I understand that Logoot and LSEQ CRDTs don't use Tombstones. Or at least they don't need to. I also read #11 in which you state this library doesn't use Tombstones. However, while I was playing around in my terminal, I came across what I thought were tombstones.


screen shot 2017-10-27 at 6 47 01 pm

This is a screen grab of the tree before removing the 'A' at position 0.

screen shot 2017-10-27 at 6 47 16 pm

This is a screen grab after removing the 'A'. Notice the null node in it's place.

Isn't that a tombstone? Or is my understanding of tombstones incorrect?

If that is a tombstone, would it be possible to implement an LSEQ tree without using them?

Couldn't you remove the node directly and replace it with the leftmost child, like in a BST? I realize this might cause the structures of the LSEQ tree to differ from user to user depending on when they receive the edits from the other users in the network.

ex.

screen shot 2017-10-27 at 7 03 27 pm

However, if the elements/characters stay in the same order, would it matter if the structure changed?

I would love to hear your insight into the matter.

Chat-Wane commented 6 years ago

Hi, I am very sorry I took this long to reply... I just saw the issue. If it's not to late, here is the reply:

Isn't that a tombstone? Or is my understanding of tombstones incorrect?

Tombstones are hidden elements that cannot be removed safely (without running costly distributed garbage collecting). Here, we see that "null" is a node that stores 2 children. The node is deleted if you remove the children, so it's not a tombstone.

Couldn't you remove the node directly and replace it with the leftmost child, like in a BST? [...] LSEQ tree to differ from user to user depending on when they receive the edits from the other users in the network.

Unfortunately, you can't for the reason you mention afterwards. The tree is an ordered set of unique identifiers. If you move an element, its identifier changes. Then, if another user inserts an element, it generates an identifier that may not put the character at the same index in the sequence for all users.

Instead, we could re-allocate identifiers when they are too large. Unfortunately, that is also not possible. If multiple users concurrently perform such re-allocation, they may duplicate characters.

However, if the elements/characters stay in the same order, would it matter if the structure changed?

You are right, you could have the sequence stored by different structures as long as they keep the same order. An obvious case would be: tree vs its flatten version as an array (good memory usage vs good performance). However, they would use the same identifiers here. I am not sure you can find an efficient transformation that allows you to keep the same order from one structure to another without risks.

sunny-b commented 6 years ago

Thanks for answering! No worries about the timeliness. The reason I asked originally was because I was working on a team that was also building a decentralized collaborative editor. We ended up using a two-dimensional array as the base structure because that's how our code editor was structured. The communication between them was faster that way.

I really appreciate the work you put into CRATE. It was a big inspiration for us and definitely ahead of its time. We ended up implementing the Version Vector with Exceptions as well. I'd never seen that before and thought it was genius.

We even wrote a case study about our project and posted it on Medium if you're interested in reading it: Conclave. Though I'm sure you already know about most of the topics we cover. Thanks again!

Chat-Wane commented 6 years ago

We ended up using a two-dimensional array as the base structure because that's how our code editor was structured.

The structure of the editor I used worked this way too, but it conveniently provided a conversion 1D <-> 2D. Not sure about its efficiency though...

Anyway, I am happy it ended up useful for you ! And I'll read your post, thanks !