Closed daira closed 1 year ago
This is all helpful feedback. I think the next improvement is to clarify some categories of rules and use consistent terminology for it. I will review the existing spec to hunt for this terminology. Before I do, here's my mental model:
Do we call 1 "fork choice rule" and 2 "consensus rules"? (Hunting through the spec…)
For TFL I believe we can keep the same categories, but category 1 has an important change, because it must provide the guarantee that "final blocks may not be rolled back". This implies that the finality gadget can halt with an error. Let me try to specify those clauses in more detail.
Let me try to frame the change with "pseudo types". In pure PoW the choice rule has a pseudotype like this:
fn select_pow(a: [Block], b: [Block]) -> [Block]
It selects between two choices of (contextually-valid) block sequences without fail.
But for TFL (or Ebb-and-Flow or Ebb-and-Flow-with-Bounded-Gap), the pseudo type now looks like this (pseudo-rust):
struct BlockChain {
prefix: [Block],
suffix: [Block],
}
fn select_tfl(a: BlockChain, b: BlockChain) -> Result<BlockChain, HALT>
The two changes to pseudo-type signature:
prefix
and a dynamic suffix
(both of contextually valid blocks)With this new type signature in mind, the fork choice rule clauses are something like this:
a.prefix
and b.prefix
is neither a.prefix
nor b.prefix
, then there are conflicting final blocks at the same height, so exit with HALT
.The "which also" clause in the last bullet should prevent rollbacks of final blocks, and given that constraint, selects the longest-pow suffix. This handles two cases:
I believe all the other categories of rules (2-4) are unaffected by TFL: contextless validation is the same, and contextual validation can treat the prefix and suffix as a single contiguous sequence of (previously contextually validated) blocks so long as they are the result of a previous select_tfl
operation. (Is this true?)
Ok, so here's some of my confusion:
What is the need/purpose of the target suffix length
parameter? In the slides I call this K
and assert that the TFL gadget "should not" finalize if it doesn't see a suffix of at least K+1
.
Lower K
is better for UX by reducing finality delay.
My intuition is if K = 0
this introduces some risks or dangers to mining.
In any case, the slides currently call this a consensus rule, but it is not, nor is it a fork choice rule.
So how do we think about this in terms of security? If a suffix less than length K
carries some security risk, and all finalizers follow the rule not to finalize until they locally see at least K
that may be fine, but suppose many finalizers violate this rule. There's no way to detect or enforce this, AFAICT. If the finalizers as a group have any incentive to introduce whatever risks a short suffix brings, that could be bad for overall system security. Time to re-read Ebb-and-Flow to see if it discusses this.
In the slides I will use "fork choice rule" for category 1 and "chain validity rules" to encompass 4 (which implies 2 & 3), and just avoid "consensus rules" except when I mean an even bigger category that includes all of that and maybe other stuff. ;-)
Ok, I'm running out of time for the presentation tweaks, but I want to summarize changes I made from this feedback:
I stopped using "consensus rules" (except maybe in very general contexts). For the summary of all rules I simply say "new rules". I use three category names: "chain validation", "fork choice", and "default/norm".
Here's a slide of new rules (excluding PoS accounting) and how it categorizes new rules:
So it uses two categories "chain validation" and "fork choice". The heuristic in Ⓑ isn't categorized and I'm not sure how to categorize this.
It's a "default / norm" that isn't verifiable, but may be helpful to security / efficiency. This category of "norms / defaults" can be very useful as long as there aren't incentives flaws or other flaws that can overwhelm / degrade the norm.
@daira's feedback on why we introduce PoS accounting rules in Main Node helped me realize and clarify the scope of the Finality Gadget:
the Finality Gadget can only influence fork choice behavior, not txn/block validity.
This seems like an important constraint that helps in reasoning about the design and security implications.
@daira separately wrote up an argument to advocate for bounding the suffix length. I mention this as one of the interactive workshop topics (with attribution for the idea). This is a set of topics I intend to poll the audience on to see what we want to drill in on as a group.
The presentation is complete and incorporated this feedback as well as possible given the schedule.
Edit by @nathan-at-least: This comment refers to Zcon4 TFL Workshop slides; (mutable doc link; may be out of sync w/ comments)
Suggested Improvements
On slide 7:
What does it mean to have a consensus rule referring to the finalization point? That isn't a property of a block that is being tested for consensus compatibility. Consensus has to be tested for blocks that are not yet final. When the block becomes final, it is already in the chain so by definition satisfies consensus.
You obviously can, and for any trailing finality protocol must, have a rule that says where the finalization point is, but that isn't and can't be a consensus rule.
Similarly,
If you want a name for a single category of rules that would cover these, I suggest calling them "chain evolution rules". (My first thought was to call them "finalization rules", but that doesn't quite work, since they constrain how a full validator's view of the chain tip changes, as well as which blocks it sees as finalized.)
It may also be worth pointing out what the existing chain evolution rules are:
There's a trivial bug that this doesn't hold for the genesis block.
More significantly, this is a security property, not a chain evolution rule. Chain evolution rules can only be based on what a given full validator sees. Different full validators will therefore see different final blocks. This is normal in the case where those blocks are all on the same linear chain, i.e. it is just that the validators are updating their view at different times. But an adversary with a large enough stake can break the security assumptions of the finalization layer.
If a full validator detects that there are two potential candidates for the next finalized block (because the security assumptions have been broken), the chain evolution rule needs to explicitly say what happens. In Ethereum this is called "finality reversion". As discussed here, an Ethereum client following the fork-choice rule will only roll back a finalized block with manual intervention:
Since this is an assertion, the client will crash in this case and need manual intervention to put it back on the correct chain. This is similar to the behaviour of zcashd and zebra on an attempted 100-block rollback. The point is that "Every final block immediately succeeds a final block with at most 1 successor." is not a sufficient specification of what a validator should do if this case is detected.
As I argued in this post, there are good reasons to explicitly model "finalization overrides". The accounting for the conceptual and implementation complexity costs of allowing finalized blocks to be rolled back is then shared between finalization overrides, and finality reversion. In fact you can potentially simplify things by saying that a detected finality reversion always requires a finalization override.
There's another reason (which is a showstopper for putting those rules in the Finality Oracle). The PoS accounting rules may need to be implemented in consensus rules. The Finality Oracle can't affect consensus because it doesn't have enough information: new PoW blocks are proposed to it, but those blocks must already satisfy consensus.
"With this design approach, in addition to our new chain evolution rules for implementing trailing finality (Ⓐ, Ⓑ, and Ⓒ), we also have new or changed consensus rules necessary for PoS accounting."