cardano-scaling / hydra

Implementation of the Hydra Head protocol
https://hydra.family/head-protocol/
Apache License 2.0
276 stars 84 forks source link

Spike: Use Two-phase commit as reliable protocol #1597

Closed ffakenz closed 2 weeks ago

ffakenz commented 2 weeks ago

Reference: https://en.wikipedia.org/wiki/Two-phase_commit_protocol

Current Situation

Our current network interface is limiting us:

-- | Handle to interface with the Hydra network and send messages "off chain".
newtype Network m msg = Network
  { broadcast :: msg -> m ()
  -- ^ Send a `msg` to the whole Hydra network.
  }

It broadcasts a message to the entire Hydra cluster, including the sender node.

This is problematic because the sender assumes that delivery is guaranteed to the rest of the party. But what if some peers were offline at that time?

Let’s call the set of peers that are offline during a message delivery the set of unreliable-parties.

The current implementation causes the set of unreliable-parties to lag behind and prevents them from reaching an agreement with the rest.

Opportunity

There are several protocols focused on achieving consensus in distributed systems using a best-effort Reliable/Atomic Broadcast:

All of them provide networking capabilities to maintain eventual consistency in distributed state machines (like Hydra).

For instance, Raft and Paxos achieve this by leveraging:

  • an event log,
  • a leader election process,
  • and a log replication process.

One of the most important capabilities is allowing the event log to evolve in the presence of unreliable-parties as long as the majority of nodes agree.

Unfortunately, this is a disadvantage for us because our L2 protocol requires all peers to sign the next snapshot to continue operating.

We cannot confirm a snapshot without the signatures of all the peers in the party.

Note that we could arguably adapt the application logic to prevent the state-machine logs from evolving whenever the event log does. This would lead to pushing complexity to the application layer, thus not making any of these an out-of-the-box solution for us.

Luckily, there is still one protocol left which, compared to the others, enforces that all participants in a transaction either commit or abort together: Two-Phase Commit (2PC).

Drawback: While 2PC ensures atomicity across all nodes, it can block in case of failures.

This opportunity seems to be a fit for our use case, but it forces us to be careful about:

In a "normal execution" (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases:

  1. Commit-request (or voting phase)
  2. Commit (or completion phase)

Proposal

There seems to be a correlation between the 2PC phases and our L2 protocol, doesn't there?

The idea is basically to capitalize on the fact that our L2 protocol is already a 2PC protocol!

This proposal suggests shifting all responsibilities to the L2 protocol, resulting in a simpler networking stack that solely handles broadcasting messages without guarantees.

How (TBD):

ch1bo commented 2 weeks ago

Our current network interface is limiting us: [...] It broadcasts a message to the entire Hydra cluster, including the sender node. This is problematic because the sender assumes that delivery is guaranteed to the rest of the party

Indeed, the broadcast is operating under these assumptions and semantics as also outlined in the specification (e.g. see https://hydra.family/head-protocol/unstable/assets/files/hydra-spec-09c867d2d94685906cbc7a74873f9de5.pdf#subsection.6.1).

This proposal suggests shifting all responsibilities to the L2 protocol, resulting in a simpler networking stack that solely handles broadcasting messages without guarantees.

This statement conflicts with:

Have the networking layer handle retry-broadcasting and consumption based on delivered in-memory messages.

So I'm not entirely sure what the proposal means so let me rephrase (using terms of this book):

You are proposing to lower the requirements of broadcast to not be a "reliable broadcast" primitive, but merely of "best-effort broadcast" semantics. Consequently, the off-chain protocol will need to change to accommodate of potential non-delivery in fail-recovery scenarios.

Is this what this is about?

ffakenz commented 2 weeks ago

I was wrong.

The protocol is not a 2PC: it has more than 2 phases.

These are the following aspects about the protocol that prevent use from going on this direction:

These points forces us to rethink where to go. A good alternative to when 2PC is not good enough is the SAGA pattern. This involves making the leader responsible for opening and closing a new round with Ack or Abort, in cooperation with the rest. That makes the leader a temporal stateful coordinator, having to deal with retries and timeouts, ensuring all peers cooperate during every phase of the round until it gets completed.

This reminds me of the original Hydra paper, where more communication rounds were involved.

ffakenz commented 2 weeks ago

From above I started exploring the following approach:

Overview

The evolution of the HeadState by the worker nodes will be done in Rounds. Rounds are coordinated by the worker node elected as Round Leader. Other nodes act as Followers, and they cooperate with the leader. Followers vote on proposals and the Round Leader collects them. When a proposal is accepted by all worker nodes, the Round Leader continues with the next phase in the Round. Followers can always reject proposals. Round Leaders will retry and keep timeouts to guarantee complete gathering of votes.

Protocol Rounds

Node Roles

Replicated State

Network Reliability

behaves differently depending on the node's role

  • round leader
  • collects votes: in-memory
  • keeps track of the phase of the round
  • on failure it restarts the round from step 1
  • manages retry and timeouts mechanisms using udp
  • round follower
  • round votes are persisted
  • votes are sent back with udp oh every phase step confirmation

Worflow:

Closed -> Opened (preparation/voting phase)

Opened -> InFlight (log replication phase)

InFlight -> Closed (leader election)

[note]: in case of disagreement the head must be manually unstuck or closed.

same as with a peer never coming back online.