Reference: https://en.wikipedia.org/wiki/Two-phase_commit_protocol

Current Situation

Our current network interface is limiting us:

-- | Handle to interface with the Hydra network and send messages "off chain".
newtype Network m msg = Network
  { broadcast :: msg -> m ()
  -- ^ Send a `msg` to the whole Hydra network.
  }

It broadcasts a message to the entire Hydra cluster, including the sender node.

This is problematic because the sender assumes that delivery is guaranteed to the rest of the party. But what if some peers were offline at that time?

Let’s call the set of peers that are offline during a message delivery the set of unreliable-parties.

The current implementation causes the set of unreliable-parties to lag behind and prevents them from reaching an agreement with the rest.

Opportunity

There are several protocols focused on achieving consensus in distributed systems using a best-effort Reliable/Atomic Broadcast:

Paxos
Raft
State Machine Replication in Byzantine Fault Tolerance (BFT) Systems (like DLTs)
CRDTs (Conflict-Free Replicated Data Types) with Strong Convergence
Two-Phase Commit

All of them provide networking capabilities to maintain eventual consistency in distributed state machines (like Hydra).

For instance, Raft and Paxos achieve this by leveraging:

an event log,

a leader election process,

and a log replication process.

One of the most important capabilities is allowing the event log to evolve in the presence of unreliable-parties as long as the majority of nodes agree.

Unfortunately, this is a disadvantage for us because our L2 protocol requires all peers to sign the next snapshot to continue operating.

We cannot confirm a snapshot without the signatures of all the peers in the party.

Note that we could arguably adapt the application logic to prevent the state-machine logs from evolving whenever the event log does. This would lead to pushing complexity to the application layer, thus not making any of these an out-of-the-box solution for us.

Luckily, there is still one protocol left which, compared to the others, enforces that all participants in a transaction either commit or abort together: Two-Phase Commit (2PC).

Drawback: While 2PC ensures atomicity across all nodes, it can block in case of failures.

This opportunity seems to be a fit for our use case, but it forces us to be careful about:

what to retry,
how long to wait for confirmation of delivery,
and only processing messages that have been confirmed by everyone.

In a "normal execution" (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases:

Commit-request (or voting phase)
Commit (or completion phase)

Proposal

There seems to be a correlation between the 2PC phases and our L2 protocol, doesn't there?

The idea is basically to capitalize on the fact that our L2 protocol is already a 2PC protocol!

This proposal suggests shifting all responsibilities to the L2 protocol, resulting in a simpler networking stack that solely handles broadcasting messages without guarantees.

How (TBD):

In L2, persist only SnapshotConfirmed events, and no intermediate events.

The latest SnapshotConfirmed signed by all participants serves as the definitive source of truth, as it can be used to close or contest the head.
Remove networking persistence.
Keep in-flight messages in memory, with bounded capacity and time-to-live (TTL) settings.
Have the networking layer handle retry-broadcasting and consumption based on delivered in-memory messages.
Switch the networking protocol to use UDP.
Extend L2 protocol with AbortSn.

Our current network interface is limiting us: [...] It broadcasts a message to the entire Hydra cluster, including the sender node. This is problematic because the sender assumes that delivery is guaranteed to the rest of the party

Indeed, the broadcast is operating under these assumptions and semantics as also outlined in the specification (e.g. see https://hydra.family/head-protocol/unstable/assets/files/hydra-spec-09c867d2d94685906cbc7a74873f9de5.pdf#subsection.6.1).

This proposal suggests shifting all responsibilities to the L2 protocol, resulting in a simpler networking stack that solely handles broadcasting messages without guarantees.

This statement conflicts with:

Have the networking layer handle retry-broadcasting and consumption based on delivered in-memory messages.

So I'm not entirely sure what the proposal means so let me rephrase (using terms of this book):

You are proposing to lower the requirements of broadcast to not be a "reliable broadcast" primitive, but merely of "best-effort broadcast" semantics. Consequently, the off-chain protocol will need to change to accommodate of potential non-delivery in fail-recovery scenarios.

Is this what this is about?

I was wrong.

The protocol is not a 2PC: it has more than 2 phases.

These are the following aspects about the protocol that prevent use from going on this direction:

it is expected that anyone can propose a new ReqTx at any time, but its the leader who gets to decide what ReqTxs has been seen and what not, so the snapshot in flight for the current round will only contain txs seen by the leader.
this makes the protocol to halt as peers will never sign a snapshot containing a tx they did not seen and validated (missing ReqTx).

These points forces us to rethink where to go. A good alternative to when 2PC is not good enough is the SAGA pattern. This involves making the leader responsible for opening and closing a new round with Ack or Abort, in cooperation with the rest. That makes the leader a temporal stateful coordinator, having to deal with retries and timeouts, ensuring all peers cooperate during every phase of the round until it gets completed.

This reminds me of the original Hydra paper, where more communication rounds were involved.

From above I started exploring the following approach:

Overview

The evolution of the HeadState by the worker nodes will be done in Rounds. Rounds are coordinated by the worker node elected as Round Leader. Other nodes act as Followers, and they cooperate with the leader. Followers vote on proposals and the Round Leader collects them. When a proposal is accepted by all worker nodes, the Round Leader continues with the next phase in the Round. Followers can always reject proposals. Round Leaders will retry and keep timeouts to guarantee complete gathering of votes.

Protocol Rounds

Closed
Opened
InFlight

Node Roles

Round Leader
Round Follower

Replicated State

Pending State (current opened round)
Confirmed State (last closed round)

Network Reliability

behaves differently depending on the node's role

round leader

collects votes: in-memory

keeps track of the phase of the round

on failure it restarts the round from step 1

manages retry and timeouts mechanisms using udp

round follower

round votes are persisted

votes are sent back with udp oh every phase step confirmation

Worflow:

Closed -> Opened (preparation/voting phase)

leader: broadcast StartRound

leader: retry unicast StartRound on timeout for a missing peer
leader: collects "votes" for each round "proposal"
peers: broadcast locally stored ReqTx{tx} (proposal)

peers: keep round proposals persistent
peers: on ReqTx proposal received => unicast leader ValidTx{tx_id} | InvalidTx{tx_id} (votes)

peers: keep round votes persistent peers: prune previous round votes persisted ???

Opened -> InFlight (log replication phase)

leader: on ReqTx proposal accepted by ALL => broadcast ReqSn{list-tx_id} (proposal)

leader: retry unicast ReqSn on timeout for a missing peer
leader: collects "votes" for the round SN "proposal"
peers: on ReqSn proposal received => unicast leader AckSn{sig} | AbortSn{sig} (votes)

peers: keep round votes persistent

InFlight -> Closed (leader election)

leader: on ReqSn proposal accepted by ALL => broadcast CloseRound{multi-sig, next-leader-id} (proposal)

leader: retry unicast CloseRound on timeout for a missing peer
peers: on CloseRound proposal received => unicast leader Ok | Invalid (votes)

peers: keep round votes persistent
peer: is unblock and can start accepting and store new ReqTx proposals locally

[note]: in case of disagreement the head must be manually unstuck or closed.

same as with a peer never coming back online.

cardano-scaling / hydra

Spike: Use Two-phase commit as reliable protocol #1597