Open amiller opened 6 years ago
Since the protocol's security relies on eventual delivery of every message, the only way to emulate this in an unreliable network is to store and resend messages indefinitely until a read receipt is obtained.
However, the need to provide unbounded buffers means that the software may end up consuming all the machine’s available disk space or RAM. If messages are dropped when the buffers full up, then the network model no longer fulfills the guarantees assumed by the security analysis, and hence the security proof cannot be guaranteed.
Hi, in a complicated network, we may have many nodes crash (or restart, power-fail) at the same time, the network may randomly drop messages.
In this case, store and resend indefinitely until a read receipt is obtained
means we need write (and sync) every message to DB ?
If receiver replied a receipt and then crash, the sender will not resend the message, so we also need write (and sync) every incoming message to DB ?
outgoing message:
VAL
ECHO
READY
BVAL
AUX
CONF
TERM
incoming message:
incoming VAL
incoming ECHO
incoming READY
incoming BVAL
incoming AUX
incoming CONF
incoming TERM
For one round, we need at least 14 writes (and 14 syncs) ? It's a huge burden for real projects.
What if we don't store and resend every message, just let the receiver query missing messages? In this case, we need store RBC input data, BA input data, and coin data. When node restarts, and it finds missing message, it sends query request to other nodes, then other nodes resend the missing messages. This only need 2 or 3 writes. But I don't know if this breaks the security guarantee...
Currently, HoneyBadgerBFT requires unbounded communication buffers, for both outgoing messages and incoming messages. The problem is that in an asynchronous network, honest nodes may lag arbitrarily far behind, so the backlog of messages to deliver may grow without bound. Since the protocol's security relies on eventual delivery of every message, the only way to emulate this in an unreliable network is to store and resend messages indefinitely until a read receipt is obtained.
This problem is discussed also in issue #12. This issue presents a different approach. The approach is based on two ideas, 1. guaranteeing that each block requires a bounded amount of messages, and 2. only buffering messages for at most a constant number of active blocks. After every block is finalized, nodes produce a threshold signature on that block, which serves as a checkpoint. Then we do not need to guarantee eventual delivery for every message. Instead, messages pertaining to old blocks can be canceled, nodes that have fallen behind can catch up using the checkpoints.
Discussion below:
HoneyBadgerBFT is written and formally analyzed in the Reliable Asynchronous Communications model. This means that every message sent from one honest party to another is guaranteed to be eventually delivered. This is considered the weakest network model used in distributed consensus protocols, as there is no guarantee about how much time it takes for a message to be delivered. However, even this network model is idealized, and it is difficult to fulfill the model in practice.
Reliable asynchronous I/O abstraction:
In reality, networks are unreliable, and messages may be dropped. In practice, a distributed system relies on an I/O abstraction that emulates the asynchronous model by a) robustly reforming connections after a network interruption e.g. whenever a TCP socket is broken, b) sending a receipt for every message received, and c) resending messages until the receipt is received. An unfortunate consequence of this approach is that outgoing messages to be sent may need to be queued for an arbitrary amount of time in unbounded storage buffers. Furthermore, protocol is written in a coroutine-style, where messages are implicitly filtered matched: e.g. “Wait to receive ECHO from S. Then, wait to receive READY from S.” Messages that are received before any process is waiting to receive them are implicitly expected to be buffered. If a node falls behind, e.g. because of a network partition, then an arbitrary number of incoming messages from future rounds may need to be buffered.
Unreliable Asynchronous I/O abstraction:
Reliable Asynchronous I/O from Unreliable I/O:
This unbounded-buffers approach to emulating reliable communication is implemented in the HoneyBadgerBFT-Python prototype, as well as the initial HoneyBadgerBFT-Go implementation. However, the need to provide unbounded buffers means that the software may end up consuming all the machine’s available disk space or RAM. If messages are dropped when the buffers full up, then the network model no longer fulfills the guarantees assumed by the security analysis, and hence the security proof cannot be guaranteed. In principle, more resources could dynamically be added (e.g., hot swap in additional hard drives or make use of a network file system on a LAN).
This issue describes how the HoneyBadgerBFT protocol can be extended to require only bounded storage in the unreliable network model, without adopting any additional network assumptions.
The proposed approach is based on the following two observations:
Observation 1. Within a single instance of the HBBFT-Block protocol, a bounded number of messages may be sent by any honest party.
This observation holds immediately for the N instances of the
RBC
protocol, and for the threshold decryption in HBBFT-Block. TheVAL
messages need to be bounded.It is less clear that the instances of
ABA
can be bounded; as written, theABA
protocol proceeds in rounds, each round making use of a commonCOIN
, until a termination condition is reached, which does not occur with any a priori bound. However, the running time analysis of the protocol suggests that even in the worst case, an instance ofABA
requires more thank
coins with probabilityO(2^-k)
. Thus it suffices to establish a bound, sayk=120
.Observation 2. Although the entire blockchain of committed transactions (
T
is the total number of transactions) grows unboundedly, the size of the current “state”S
at any given time is typically much smaller and may be considered bounded.For example,
|S|
may be bounded by the number of active accounts, whereas |T| is the number of transactions made by all the accounts.Hence if a node falls many blocks behind (or even if it crashes and restarts), then it can “catch up” to the current block by downloading just the current state
S
rather than the entire log of transactions. This is known as SPV-syncing in Bitcoin, and fast-sync in Ethereum.The proposed approach combines these two observations to ensure that messages are only buffered.
After finalizing each HBBFT-Block, nodes produce a
t+1
threshold signature on a merkle tree over all the transactions, as well as a merkle tree over the current state fileS
. This signature serves as a CHECKPOINT, a succinct piece of evidence that the current block has concluded. CHECKPOINTs are used for three different purposes:I. Prevent DoS from malicious nodes. Messages for future rounds
B’ > B
are ignored until a CHECKPOINT message for round B is received. This enables a node to discard DoS messages sent from an attacker, which would otherwise appear to be plausible messages in the future.II. Allows lagging (or restarted) nodes to recover. When a node receives a valid CHECKPOINT for a later block than the current one (for block
B’ > B
), then it determines it has fallen behind. It deallocates any buffered incoming or outgoing messages in blockB
, andIII. Allow nodes to garbage collect old outgoing messages. Because nodes that have fallen behind can catch up via a CHECKPOINT, it is not necessary to buffer outgoing protocol messages pertaining to earlier blocks. Outgoing messages buffered in the I/O abstraction can be canceled/withdrawn, replaced with CHECKPOINT messages for the current round.