Electric-Coin-Company / tfl-book

A Trailing Finality Layer book for a proposed Zcash protocol change.
MIT License
5 stars 2 forks source link

Does Snap-and-Chat design split the minimum cost of attack for specific known attacks? #42

Open nathan-at-least opened 1 year ago

nathan-at-least commented 1 year ago

Question

In Ebb-and-Flow there are two subprotocols providing availability and finality respectively which are then coupled.

When starting a "known attack case analysis" (such as in #39) and also from conversations at Zcon4, we are concerned that some attacks have a lower min cost of attack estimate than their equivalents in pure PoW or PoS, because the cost to attack is the minimum of either component.

a. Is this concern true in general for Ebb-and-Flow? b. If not, is it specific to TFL?

One confusion or complication is that if the two subprotocols in Ebb-and-Flow both rely on the same pool of stake for security cost estimates, then the two subprotocols are not mere black boxes, but have an important underlying coupling. (My understanding of modern Ethereum Ghasper is that it is similar to Ebb-and-Flow with shared staking pool for security.)

daira commented 1 year ago

As I understand it, in ebb-and-flow the finalized chain will not roll back as long as the security requirements of $\Pi\mathrm{bft}$ are met. (The modification of $\Pi\mathrm{bft}$ to attach chain snapshots to the BFT blocks cannot break this property, because it's only imposing an additional constraint on when honest nodes will notarise a block, and so the modification can only harm liveness.)

I don't know how TFL differs. Where should I look for a description of TFL? (Remember I didn't attend Zcon4.)

nathan-at-least commented 1 year ago

From §I.D (p.3):

Theorem (Informal). Consider a network environment where: 1) Communication is asynchronous until a global stabilization time GST after which communication becomes synchronous, and 2) honest nodes sleep and wake up until a global awake time GAT after which all nodes are awake. Adversary nodes are always awake.

Then 1) (P1 - Finality): The finalized ledger LOGfin is guaranteed to be safe at all times, and live after max{GST, GAT}, provided that fewer than 33% of all the nodes are adversarial.

Here it is using the "proportion of honest nodes" assumption, rather than "estimated minimum cost of known attacks" (EMCKA).

Why EMCKA

I'd like for our analysis for Zcash TFL (or other protocols) to strive for something beyond this assumption for assessing security. Given this assumption, the question becomes "how hard is it for an attacker to compromise >33% of the BFT nodes?" See #48

The real-world answer is extremely complex. For example, if an attacker finds an 0-day on the a host platform which over a third of BFT nodes run on, there's one vector.

The "estimated minimum cost of known attacks" is just one simplification of this issue which assumes "an attacker only compromises BFT nodes by operating a large number of nodes, then participating in the PoS bonding protocol to include them into the roster (with "normal" protocol behavior), then after that setup is complete, they execute an attack on the BFT protocol.

It's not a foolproof metric nor any kind of guarantee but provides some estimate of the cost of one known attack (similar to the estimated cost of a 51% PoW attack via the route of acquiring legitimate hash power of sufficient degree for a setup phase).

Related Tickets / Analyses

nathan-at-least commented 1 year ago

Provisional answer to this ticket:

Yes, Ebb-and-Flow splits the cost of attack if the two subprotocols use independent "security resources".

For the specific case of a PoW $\Pi{lc}$ combined with $\Pi{bft}$ there's an informal assertion shown in #50 because safety or dynamic availability assumptions are of the form:

[a security property] holds if [something about PoS nodes/resources] AND [something abot PoW nodes/resources].

Therefore, violating either branch of the "AND" clause violates the security guarantee.

So finally: iff the cost of compromising either side of the branch is independent, then the security property can be invalidated by cheaper of the two.

If this analysis is correct, how should we proceed? (We should track how to proceed as a separate decision-making ticket.)

I can think of a few options:

nathan-at-least commented 1 year ago

Assigning @daira to review the above assessment. If ze concurs, we can close this ticket as resolved and open a new ticket for "how to proceed".

daira commented 1 year ago

@nathan-at-least wrote:

For the specific case of a PoW combined with

there's an informal assertion shown in https://github.com/Electric-Coin-Company/tfl-book/issues/50 because safety or dynamic availability assumptions are of the form:

[a security property] holds if [something about PoS nodes/resources] AND [something about PoW nodes/resources].

Therefore, violating either branch of the "AND" clause violates the security guarantee.

So finally: iff the cost of compromising either side of the branch is independent, then the security property can be invalidated by cheaper of the two.

Safety

Safety (rollback-freeness) of $\mathsf{LOG} {\mathrm{fin}}$ depends only on the safety and assumptions of the BFT protocol, not $\Pi {\mathrm{lc}}$. You might be confused by Figure 7, but the text to the right of the figure confirms this:

We see from Figure 7 that the safety of $\Pi {\mathrm{bft}}$ (box 1) implies the safety of $\mathsf{LOG} {\mathrm{fin}}$ (box 5).

Note that the input to the BFT protocol is only "confirmed" snapshots, i.e. chains truncated by the last $\sigma$ blocks where $\sigma$ is the $\Pi {\mathrm{lc}}$ confirmation depth. A failure of $\Pi {\mathrm{bft}}$ can cause a rollback in $\mathsf{LOG} {\mathrm{fin}}$ and $\mathsf{LOG} {\mathrm{da}}$ past the finalization point, because a different order of snapshots may be agreed on. However, a node that is online through such a rollback can detect it and just refuse to roll back, e.g. crashing instead. (A node that is newly synced can't tell which chain to choose in this situation without reliable out-of-band information.)

On any node $i$, $\mathsf{LOG} {\mathrm{fin},i}$ is always a prefix of $\mathsf{LOG} {\mathrm{da},i}$ by construction. Safety of $\mathsf{LOG} {\mathrm{da}}$ depends on both $\Pi {\mathrm{lc}}$ and $\Pi {\mathrm{bft}}$. In particular, it is an unavoidable consequence of the prefix property that if $\mathsf{LOG} {\mathrm{fin},i}$ changes, other than just by moving the finalization point forward within $\mathsf{LOG} {\mathrm{da},i}$, then so does $\mathsf{LOG} {\mathrm{da},i}$. This could invalidate transactions after the finalization point that depended on outputs from the old $\mathsf{LOG} {\mathrm{fin},i}$, or that are double-spends of outputs in the new $\mathsf{LOG} {\mathrm{fin},i}$.

Naively (and I got this wrong in the first version of this comment), we might think that knowing that $\mathsf{LOG} {\mathrm{fin}}$ does not roll back past the finalization point might allow us to conclude something about the safety of $\mathsf{LOG} {\mathrm{da}}$ relative to $\Pi {\mathrm{lc}}$. However, we can't, at least not for the version of snap-and-chat described in the paper. Although an honest BFT node will only vote for a confirmed snapshot, the paper does not specify any check on the snapshots that it receives in the $\mathsf{LOG} {\mathrm{bft}}$ chain. So, if the BFT protocol is subverted then, as described, strictly speaking nothing can be said about the safety of $\mathsf{LOG} {\mathrm{fin}}$, and therefore very little can be said about the safety of $\mathsf{LOG} {\mathrm{da}}$. In particular, an adversary could roll $\mathsf{LOG} {\mathrm{fin}}$ back to exactly the finalization point (avoiding detection by the mechanism in the previous paragraph), and then insert transactions that disrupt $\mathsf{LOG} {\mathrm{da}}$.

We might, however, be able to do better than this by performing additional checks on the snapshots in $\mathsf{LOG} {\mathrm{bft}}$, since they contain information about how much work was done to create the chain — not just sequences of transactions. So I cannot see an argument that depending completely on $\Pi {\mathrm{bft}}$ is inherent to the ebb-and-flow security model.

Aside: The paper also has a bug that affects ability to spend from some finalized outputs, which doesn't show up in the security analysis. That appears to be fixable.

In terms of "split" resources (e.g. issuance) between $\Pi {\mathrm{lc}}$ and $\Pi {\mathrm{bft}}$ reducing the security of each of them, that's a potential issue but I don't think we have enough information to answer it in the abstract. It really depends on what the split is, and on the specific protocols involved.

Liveness

Liveness of $\mathsf{LOG} {\mathrm{fin}}$ and $\mathsf{LOG} {\mathrm{da}}$ is complicated. As the paper says, in both cases it is dependent on liveness of both $\Pi {\mathrm{lc}}$ and $\Pi {\mathrm{bft}}$, but also snap-and-chat as described makes some potentially unrealistic assumptions in its network communication model that could be a problem in practice, and we can do better. I'll address this in the note I'm writing.

As I've argued elsewhere, having the $\mathsf{LOG}_ {\mathrm{da}}$ chain become unavailable for new spending transactions in a long finalization stall is not a bug.

daira commented 1 year ago

I wrote:

Note that the input to the BFT protocol is only "confirmed" snapshots, i.e. chains truncated by the last blocks where is the confirmation depth.

More precisely, in the snap-and-chat paper we have this (middle-left of page 3):

This motivates the last ingredient of our construction: in the $\Pi{\mathrm{bft}}$ sub-protocol, each honest node boycotts the finalization of snapshots that are not confirmed in $\Pi{\mathrm{lc}}$ in its view.

That is, an honest node does not vote for snapshots that are unconfirmed in its view.

Recall that a confirmed snapshot of $\Pi{\mathrm{lc}}$ is defined as a snapshot that is at least $\sigma$ blocks deep in a valid chain of $\Pi{\mathrm{lc}}$. (Importantly, the chain must be valid to $\sigma$ blocks ahead of the snapshot, not just have sufficient work. This is because it is the depth of the snapshot in a chain that the rest of the $\Pi_{\mathrm{lc}}$ network could build on that matters.)

The reason for the "honest" and "in its view" caveats is that the verifier of a signature by node $i$ on a proposal cannot directly check that the proposal actually was confirmed in the view of node $i$, because the verifier cannot know what chains node $i$ has seen. Nevertheless, there is an efficient way to strengthen this condition. If a proposal includes $\sigma$ block headers of $\Pi_{\mathrm{lc}}$ after the snapshot that is being proposed, then anyone who sees the proposal can directly check whether it has sufficient proof-of-work "on top" of it, and indirectly check (if they have the blocks in the associated longer chain) that it is confirmed by the above definition. Note that validator signatures are on the proposal including these headers, and so the verifier of a validator signature can now be sure that the validator had seen sufficient information that it should have been able to check that the snapshot is confirmed.

Also, the act of proposing a snapshot that will be viewed as valid makes the chain available to $\sigma$ blocks ahead, and so this prevents an adversary from mining on a private chain that it withholds while still proposing it as a snapshot. (It is not necessary for $\Pi_{\mathrm{lc}}$ nodes to directly look at proposal messages, because the proposal is only viewed as valid if/when the block headers correspond to available blocks.)

Note that honest validators already need to have seen the chain $\sigma$ blocks ahead of a proposal and be full nodes in $\Pi_{\mathrm{lc}}$, in order to check whether its snapshot is confirmed. They should not block on obtaining the full chain for an arbitrary proposal (because that could allow a denial-of-service on validators); they should just not vote for it if they don't have enough information to check its validity. If they do and the chain exists (i.e. there are blocks corresponding to the header hashes) but is not valid, then it is provable that they did not act honestly, and their stake can be slashed.

More generally, snap-and-chat as described in the paper keeps the interaction between subprotocols to the minimum that is necessary to prove their security claims. This is quite reasonable for a paper, because it makes snap-and-chat applicable to a wider range of subprotocols, and avoids distracting reviewers with things they may object to or misunderstand. While it's intuitively appealing not to entangle the two subprotocols "too much", I think that the trade-offs for a deployable protocol favour having participants in each of $\Pi{\mathrm{bft}}$ and $\Pi{\mathrm{lc}}$ verify as much as they can of the other subprotocol, because this actually provides substantively stronger security. In particular, it constrains even dishonest nodes to only casting a vote-that-will-be-seen-as-valid for a chain that is confirmed at the time of the vote. This means that we know that 2/3rds of validators (or of stake, if validation is by PoS) voted for a confirmed chain in a notarized proposal (using the terminology from Streamlet), without making any assumption on whether they are honest or not.

(These votes still might not be honest votes in the BFT protocol, because they might not satisfy other voting conditions in that protocol. Ideally, those conditions would also be objectively verifiable in a similar way, but they may not be. For example, in Streamlet the condition that "the proposed block extends from (one of) the longest notarized chain(s) that the voter has seen" is not objectively verifiable, because the voter cannot prove a negative that it did not see a longer BFT chain.)

nathan-at-least commented 1 year ago

I propose we change the scope of this ticket to Crosslink only, and we postpone it until the security proofs for safety and liveness of Crosslink are complete.

daira commented 11 months ago

I need to re-review the comments on this ticket in the context of Crosslink.