Electric-Coin-Company / tfl-book

A Trailing Finality Layer book for a proposed Zcash protocol change.
MIT License
5 stars 2 forks source link

Feedback on draft Zcon4 presentation #40

Closed daira closed 1 year ago

daira commented 1 year ago

Edit by @nathan-at-least: This comment refers to Zcon4 TFL Workshop slides; (mutable doc link; may be out of sync w/ comments)

Suggested Improvements

On slide 7:

New Consensus Rule Ⓐ: Do not finalize the next block unless the suffix is ≥ K blocks.

What does it mean to have a consensus rule referring to the finalization point? That isn't a property of a block that is being tested for consensus compatibility. Consensus has to be tested for blocks that are not yet final. When the block becomes final, it is already in the chain so by definition satisfies consensus.

You obviously can, and for any trailing finality protocol must, have a rule that says where the finalization point is, but that isn't and can't be a consensus rule.

Similarly,

If you want a name for a single category of rules that would cover these, I suggest calling them "chain evolution rules". (My first thought was to call them "finalization rules", but that doesn't quite work, since they constrain how a full validator's view of the chain tip changes, as well as which blocks it sees as finalized.)

It may also be worth pointing out what the existing chain evolution rules are:

Every final block immediately succeeds a final block with at most 1 successor.

There's a trivial bug that this doesn't hold for the genesis block.

More significantly, this is a security property, not a chain evolution rule. Chain evolution rules can only be based on what a given full validator sees. Different full validators will therefore see different final blocks. This is normal in the case where those blocks are all on the same linear chain, i.e. it is just that the validators are updating their view at different times. But an adversary with a large enough stake can break the security assumptions of the finalization layer.

If a full validator detects that there are two potential candidates for the next finalized block (because the security assumptions have been broken), the chain evolution rule needs to explicitly say what happens. In Ethereum this is called "finality reversion". As discussed here, an Ethereum client following the fork-choice rule will only roll back a finalized block with manual intervention:

    # Check block is a descendant of the finalized block
    assert (
        get_ancestor(store, signing_root(block), store.blocks[store.finalized_checkpoint.root].slot) ==
        store.finalized_checkpoint.root
    )

Since this is an assertion, the client will crash in this case and need manual intervention to put it back on the correct chain. This is similar to the behaviour of zcashd and zebra on an attempted 100-block rollback. The point is that "Every final block immediately succeeds a final block with at most 1 successor." is not a sufficient specification of what a validator should do if this case is detected.

As I argued in this post, there are good reasons to explicitly model "finalization overrides". The accounting for the conceptual and implementation complexity costs of allowing finalized blocks to be rolled back is then shared between finalization overrides, and finality reversion. In fact you can potentially simplify things by saying that a detected finality reversion always requires a finalization override.

Then why place PoS accounting rules (roster, voting weights, reward/slashing, delegation, etc…) in Main Node rather than the new separate PoS Finality Oracle?

Answer: tight coupling with other ledger / transaction rules

  • PoS accounting rules are naturally transaction scoped as are most ledger rules.
  • Tracking this in the Finality Widget would require complex and stateful messaging between it and Full Node.

There's another reason (which is a showstopper for putting those rules in the Finality Oracle). The PoS accounting rules may need to be implemented in consensus rules. The Finality Oracle can't affect consensus because it doesn't have enough information: new PoW blocks are proposed to it, but those blocks must already satisfy consensus.

With this design approach, our previously simple new consensus rules for implementing trailing finality (Ⓐ, Ⓑ, and Ⓒ) are now extended with new rules necessary for PoS accounting.

"With this design approach, in addition to our new chain evolution rules for implementing trailing finality (Ⓐ, Ⓑ, and Ⓒ), we also have new or changed consensus rules necessary for PoS accounting."

nathan-at-least commented 1 year ago

This is all helpful feedback. I think the next improvement is to clarify some categories of rules and use consistent terminology for it. I will review the existing spec to hunt for this terminology. Before I do, here's my mental model:

Current Categories

  1. Given two valid chains, select one as the best consensus candidate. (Aka "fork choice rule")
  2. context-free transaction- and block-scoped rules (examples: "is the signature valid", "do the transparent values add up", "is the PoW solution valid")
  3. contextual transaction-scoped rules: "is this transaction valid in the context of a selected valid chain and some dependency transactions that are unmined?"
  4. contextual block-scoped rules: "is this block valid given a validated prefix?"

Do we call 1 "fork choice rule" and 2 "consensus rules"? (Hunting through the spec…)

Proposed TFL Categories

For TFL I believe we can keep the same categories, but category 1 has an important change, because it must provide the guarantee that "final blocks may not be rolled back". This implies that the finality gadget can halt with an error. Let me try to specify those clauses in more detail.

Let me try to frame the change with "pseudo types". In pure PoW the choice rule has a pseudotype like this:

fn select_pow(a: [Block], b: [Block]) -> [Block]

It selects between two choices of (contextually-valid) block sequences without fail.

But for TFL (or Ebb-and-Flow or Ebb-and-Flow-with-Bounded-Gap), the pseudo type now looks like this (pseudo-rust):

struct BlockChain {
  prefix: [Block],
  suffix: [Block],
}

fn select_tfl(a: BlockChain, b: BlockChain) -> Result<BlockChain, HALT>

The two changes to pseudo-type signature:

With this new type signature in mind, the fork choice rule clauses are something like this:

The "which also" clause in the last bullet should prevent rollbacks of final blocks, and given that constraint, selects the longest-pow suffix. This handles two cases:

I believe all the other categories of rules (2-4) are unaffected by TFL: contextless validation is the same, and contextual validation can treat the prefix and suffix as a single contiguous sequence of (previously contextually validated) blocks so long as they are the result of a previous select_tfl operation. (Is this true?)

nathan-at-least commented 1 year ago

Ok, so here's some of my confusion:

What is the need/purpose of the target suffix length parameter? In the slides I call this K and assert that the TFL gadget "should not" finalize if it doesn't see a suffix of at least K+1.

Lower K is better for UX by reducing finality delay.

My intuition is if K = 0 this introduces some risks or dangers to mining.

In any case, the slides currently call this a consensus rule, but it is not, nor is it a fork choice rule.

So how do we think about this in terms of security? If a suffix less than length K carries some security risk, and all finalizers follow the rule not to finalize until they locally see at least K that may be fine, but suppose many finalizers violate this rule. There's no way to detect or enforce this, AFAICT. If the finalizers as a group have any incentive to introduce whatever risks a short suffix brings, that could be bad for overall system security. Time to re-read Ebb-and-Flow to see if it discusses this.

nathan-at-least commented 1 year ago

In the slides I will use "fork choice rule" for category 1 and "chain validity rules" to encompass 4 (which implies 2 & 3), and just avoid "consensus rules" except when I mean an even bigger category that includes all of that and maybe other stuff. ;-)

nathan-at-least commented 1 year ago

Ok, I'm running out of time for the presentation tweaks, but I want to summarize changes I made from this feedback:

Rule Categorization

I stopped using "consensus rules" (except maybe in very general contexts). For the summary of all rules I simply say "new rules". I use three category names: "chain validation", "fork choice", and "default/norm".

Here's a slide of new rules (excluding PoS accounting) and how it categorizes new rules:

image

So it uses two categories "chain validation" and "fork choice". The heuristic in Ⓑ isn't categorized and I'm not sure how to categorize this.

It's a "default / norm" that isn't verifiable, but may be helpful to security / efficiency. This category of "norms / defaults" can be very useful as long as there aren't incentives flaws or other flaws that can overwhelm / degrade the norm.

Clarification of Finality Gadget impact

@daira's feedback on why we introduce PoS accounting rules in Main Node helped me realize and clarify the scope of the Finality Gadget:

the Finality Gadget can only influence fork choice behavior, not txn/block validity.

This seems like an important constraint that helps in reasoning about the design and security implications.

Bounding the max suffix length

@daira separately wrote up an argument to advocate for bounding the suffix length. I mention this as one of the interactive workshop topics (with attribution for the idea). This is a set of topics I intend to poll the audience on to see what we want to drill in on as a group.

nathan-at-least commented 1 year ago

The presentation is complete and incorporated this feedback as well as possible given the schedule.