Edit by @nathan-at-least: This comment refers to Zcon4 TFL Workshop slides; (mutable doc link; may be out of sync w/ comments)

Suggested Improvements

On slide 7:

New Consensus Rule Ⓐ: Do not finalize the next block unless the suffix is ≥ K blocks.

What does it mean to have a consensus rule referring to the finalization point? That isn't a property of a block that is being tested for consensus compatibility. Consensus has to be tested for blocks that are not yet final. When the block becomes final, it is already in the chain so by definition satisfies consensus.

You obviously can, and for any trailing finality protocol must, have a rule that says where the finalization point is, but that isn't and can't be a consensus rule.

Similarly,

"No final block may be rolled back." can't be a consensus rule, because consensus rules are not about the dynamic state of the chain tip. If a chain satisfies consensus then so do all chains that are prefixes of it. "No final block may be rolled back" is a modification of the meta-rule specifying how the "best valid block chain" evolves (see section 3.3 of the spec, and notice that it isn't referred to as a consensus rule there).
"Every final block immediately succeeds a final block with at most 1 successor." can't be a consensus rule for the same reason as the rule on slide 7.

If you want a name for a single category of rules that would cover these, I suggest calling them "chain evolution rules". (My first thought was to call them "finalization rules", but that doesn't quite work, since they constrain how a full validator's view of the chain tip changes, as well as which blocks it sees as finalized.)

It may also be worth pointing out what the existing chain evolution rules are:

"In order to choose the best valid block chain in its view of the overall block tree, a node sums the work, as defined in § 7.7.5 ‘Definition of Work’, of all blocks in each valid block chain, and considers the valid block chain with greatest total work to be best."
"A full validator MAY impose a limit on the number of blocks it will “roll back” when switching from one best valid block chain to another that is not a descendent. For zcashd and zebra this limit is 100 blocks."

Every final block immediately succeeds a final block with at most 1 successor.

There's a trivial bug that this doesn't hold for the genesis block.

More significantly, this is a security property, not a chain evolution rule. Chain evolution rules can only be based on what a given full validator sees. Different full validators will therefore see different final blocks. This is normal in the case where those blocks are all on the same linear chain, i.e. it is just that the validators are updating their view at different times. But an adversary with a large enough stake can break the security assumptions of the finalization layer.

If a full validator detects that there are two potential candidates for the next finalized block (because the security assumptions have been broken), the chain evolution rule needs to explicitly say what happens. In Ethereum this is called "finality reversion". As discussed here, an Ethereum client following the fork-choice rule will only roll back a finalized block with manual intervention:

    # Check block is a descendant of the finalized block
    assert (
        get_ancestor(store, signing_root(block), store.blocks[store.finalized_checkpoint.root].slot) ==
        store.finalized_checkpoint.root
    )

Since this is an assertion, the client will crash in this case and need manual intervention to put it back on the correct chain. This is similar to the behaviour of zcashd and zebra on an attempted 100-block rollback. The point is that "Every final block immediately succeeds a final block with at most 1 successor." is not a sufficient specification of what a validator should do if this case is detected.

As I argued in this post, there are good reasons to explicitly model "finalization overrides". The accounting for the conceptual and implementation complexity costs of allowing finalized blocks to be rolled back is then shared between finalization overrides, and finality reversion. In fact you can potentially simplify things by saying that a detected finality reversion always requires a finalization override.

Then why place PoS accounting rules (roster, voting weights, reward/slashing, delegation, etc…) in Main Node rather than the new separate PoS Finality Oracle?

Answer: tight coupling with other ledger / transaction rules

PoS accounting rules are naturally transaction scoped as are most ledger rules.

Tracking this in the Finality Widget would require complex and stateful messaging between it and Full Node.

There's another reason (which is a showstopper for putting those rules in the Finality Oracle). The PoS accounting rules may need to be implemented in consensus rules. The Finality Oracle can't affect consensus because it doesn't have enough information: new PoW blocks are proposed to it, but those blocks must already satisfy consensus.

With this design approach, our previously simple new consensus rules for implementing trailing finality (Ⓐ, Ⓑ, and Ⓒ) are now extended with new rules necessary for PoS accounting.

"With this design approach, in addition to our new chain evolution rules for implementing trailing finality (Ⓐ, Ⓑ, and Ⓒ), we also have new or changed consensus rules necessary for PoS accounting."

This is all helpful feedback. I think the next improvement is to clarify some categories of rules and use consistent terminology for it. I will review the existing spec to hunt for this terminology. Before I do, here's my mental model:

Current Categories

Given two valid chains, select one as the best consensus candidate. (Aka "fork choice rule")
context-free transaction- and block-scoped rules (examples: "is the signature valid", "do the transparent values add up", "is the PoW solution valid")
contextual transaction-scoped rules: "is this transaction valid in the context of a selected valid chain and some dependency transactions that are unmined?"
contextual block-scoped rules: "is this block valid given a validated prefix?"

Do we call 1 "fork choice rule" and 2 "consensus rules"? (Hunting through the spec…)

Proposed TFL Categories

For TFL I believe we can keep the same categories, but category 1 has an important change, because it must provide the guarantee that "final blocks may not be rolled back". This implies that the finality gadget can halt with an error. Let me try to specify those clauses in more detail.

Let me try to frame the change with "pseudo types". In pure PoW the choice rule has a pseudotype like this:

fn select_pow(a: [Block], b: [Block]) -> [Block]

It selects between two choices of (contextually-valid) block sequences without fail.

But for TFL (or Ebb-and-Flow or Ebb-and-Flow-with-Bounded-Gap), the pseudo type now looks like this (pseudo-rust):

struct BlockChain {
  prefix: [Block],
  suffix: [Block],
}

fn select_tfl(a: BlockChain, b: BlockChain) -> Result<BlockChain, HALT>

The two changes to pseudo-type signature:

the context has both a final prefix and a dynamic suffix (both of contextually valid blocks)
this can result in a successful choice OR a loud error where the finality gadget halts.

With this new type signature in mind, the fork choice rule clauses are something like this:

if the common sub-prefix of a.prefix and b.prefix is neither a.prefix nor b.prefix, then there are conflicting final blocks at the same height, so exit with HALT.
otherwise, set the new prefix to the longer of the two prefixes. (If they are equal length, they are equal prefixes.)
select the longest dynamic suffix using the pre-existing (NU5) PoW fork-choice rules which also extends the newly selected suffix.

The "which also" clause in the last bullet should prevent rollbacks of final blocks, and given that constraint, selects the longest-pow suffix. This handles two cases:

both suffixes extend the most recent final block, so select between them as per vanilla PoW fork choice.
one suffix would roll back a final block because it has a different block at that height, therefore the other suffix must be selected.

I believe all the other categories of rules (2-4) are unaffected by TFL: contextless validation is the same, and contextual validation can treat the prefix and suffix as a single contiguous sequence of (previously contextually validated) blocks so long as they are the result of a previous select_tfl operation. (Is this true?)

Ok, so here's some of my confusion:

What is the need/purpose of the target suffix length parameter? In the slides I call this K and assert that the TFL gadget "should not" finalize if it doesn't see a suffix of at least K+1.

Lower K is better for UX by reducing finality delay.

My intuition is if K = 0 this introduces some risks or dangers to mining.

In any case, the slides currently call this a consensus rule, but it is not, nor is it a fork choice rule.

So how do we think about this in terms of security? If a suffix less than length K carries some security risk, and all finalizers follow the rule not to finalize until they locally see at least K that may be fine, but suppose many finalizers violate this rule. There's no way to detect or enforce this, AFAICT. If the finalizers as a group have any incentive to introduce whatever risks a short suffix brings, that could be bad for overall system security. Time to re-read Ebb-and-Flow to see if it discusses this.

In the slides I will use "fork choice rule" for category 1 and "chain validity rules" to encompass 4 (which implies 2 & 3), and just avoid "consensus rules" except when I mean an even bigger category that includes all of that and maybe other stuff. ;-)

Ok, I'm running out of time for the presentation tweaks, but I want to summarize changes I made from this feedback:

Rule Categorization

I stopped using "consensus rules" (except maybe in very general contexts). For the summary of all rules I simply say "new rules". I use three category names: "chain validation", "fork choice", and "default/norm".

Here's a slide of new rules (excluding PoS accounting) and how it categorizes new rules:

So it uses two categories "chain validation" and "fork choice". The heuristic in Ⓑ isn't categorized and I'm not sure how to categorize this.

It's a "default / norm" that isn't verifiable, but may be helpful to security / efficiency. This category of "norms / defaults" can be very useful as long as there aren't incentives flaws or other flaws that can overwhelm / degrade the norm.

Clarification of Finality Gadget impact

@daira's feedback on why we introduce PoS accounting rules in Main Node helped me realize and clarify the scope of the Finality Gadget:

the Finality Gadget can only influence fork choice behavior, not txn/block validity.

This seems like an important constraint that helps in reasoning about the design and security implications.

Bounding the max suffix length

@daira separately wrote up an argument to advocate for bounding the suffix length. I mention this as one of the interactive workshop topics (with attribution for the idea). This is a set of topics I intend to poll the audience on to see what we want to drill in on as a group.

The presentation is complete and incorporated this feedback as well as possible given the schedule.

Electric-Coin-Company / tfl-book

Feedback on draft Zcon4 presentation #40