braid-org / braidjs

Monorepo for Braid Projects in Javascript
https://braid.org
259 stars 14 forks source link

Question about fissure and ack in sync9 #25

Closed iacore closed 4 days ago

iacore commented 1 week ago

I read the code for sync9 and antimatter. It was never specified that sync9 happen over reliable and ordered transport like TCP.

I have observed that

  1. This seems to be the case in visualization of https://braid.org/sync9
  2. In antimatter.js, websocket is used.

Is there a written explanation of sync9? I haven't found any on https://braid.org/.

toomim commented 1 week ago

Hi! Thanks for your interest! We would love to help make this clearer.

It sounds like you're interested in both Sync9 and Antimatter:

Neither of these algorithms depend upon reliable transport. Sync9 does not care about the network at all. Antimatter assumes unreliable connections between peers, and sends its own acknowledgements to make them reliable.

However, now that you bring it up, I don't think we've tested antimatter on unreliable (e.g. UDP) connections, and it's possible that there are some corner cases where we'd need to augment messages with some ordering information.

If you're curious about the Fissure and Ack system, you're going to want to understand Antimatter. I'd suggest starting with the Three-Wave Ack protocol that I just documented this week. This is one piece of the Antimatter protocol. The next thing to understand is the Fissure/Welcome system. I'd suggest watching the video on the Antimatter page to learn about that, and compare your knowledge with the code below it (search for the "fissure" and "welcome" messages sections), as we are still getting a proper writeup finished.

As you go through it, please post your questions here for us! It will help us know what we need to clarify as we write up the algorithm documentation.

Thank you for your interest! We'd also love to hear what got you interested in the work.

iacore commented 1 week ago

Thanks for the links. I have mistaken the name "antimatter" and "sync9" for each another.

Thank you for your interest! We'd also love to hear what got you interested in the work.

Someone said PZP looks like a good protocol for decentralized social media, so I looked into similar protocols. braid came up as the protocol most willing to explain itself, so despite my inability to understand anything on the website, I looked into it.

In the end, I ended up formalizing the idea behind this CRDT thing. Here's the code. It doesn't compile yet due to type system complication, and I'm solving it myself. I'll let you know when everything is finished.

iacore commented 1 week ago

fissure (netsplit) only exists for connection-oriented protocols, which UDP isn't.

iacore commented 1 week ago

I've read https://braid.org/algorithms/three-wave-acks. If netsplit doesn't happen, why three wave? Why not two?

Unreliable transmission can be solved with resend. If netsplit doesn't happen, that is.

image

Here's the algorithm.

top left: flow and color the graph top right: back propagate bottom left: longer path is ignored

If I understand correctly, if the nodes (vertices in the graph) remember to resend messages, then, eventual consistency is enough.

If my idea works, then floodfill will work too, which is simpler.

toomim commented 1 week ago

If netsplit doesn't happen, why three wave? Why not two?

How would you report that everyone has acknowledged m without the third wave? This has nothing to do with netsplits.

It sounds like you're suggesting an optimization to learn paths in the network and ignore the longer paths. That might work, but it seems it would also depend on network latency, which can change over time even without a full network partition.

fissure (netsplit) only exists for connection-oriented protocols, which UDP isn't.

Yes, I was referring to protocols like QUIC, which have unreliable connections. There are a few terms being thrown around here:

Your first question was about reliable and ordered transport. I was pointing out that antimatter (although not sync9) depend on connections, but should work fine without reliability or ordering, because it implements that on its own where it needs it, although I don't think it's been explicitly fuzz-tested for in our prototype, and thus there might be some tweaks necessary to prove that it works.

iacore commented 1 week ago

How would you report that everyone has acknowledged m without the third wave? This has nothing to do with netsplits.

It doesn't depend on "everyone" to acknowledge any message. A bit hard to explain here.

toomim commented 1 week ago

Ah, then we might be thinking about different problems. What problem are you trying to solve?

The problem Antimatter solves is knowing when it's safe to prune old history in a P2P CRDT or OT. It's only safe to prune away a version oldV once everyone has acknowledged a message newV newer than oldV, because then you know that nobody will base an edit on oldV itself — they will base it only on newV or something newer. So this requires knowing when everyone has acknowledged a newV. If a peer exists that has not acknowledged newV, then it's possible that the peer has also send some messages based on oldV that you have not received yet, and thus it's not safe for you to prune away oldV.

Are you thinking about a different problem?

BTW, as I talk this through, I realize that it might be more accurate to say that Antimatter does depend on some ordered & reliable communication in the connections between peers, even though it implements most of the ordering and reliableness itself.

iacore commented 1 week ago

I am trying to solve the same problem as you described.

I have not solved it yet.

I cannot offer further commentary until then, sorry.

toomim commented 4 days ago

Ok. I'm closing this in the meantime. Please re-open an issue when you are ready.