AntelopeIO / spring

C++ implementation of the Antelope protocol with Savanna consensus
Other
5 stars 2 forks source link

Instant Finality Design and Proofs #92

Open arhag opened 11 months ago

arhag commented 11 months ago
### Design
- [ ] https://github.com/AntelopeIO/leap/issues/1952
- [x] IF: When does the new finalizer set become active?
- [x] IF: What finalizer behavior is considered provably faulty?
- [ ] IF: What do cryptographic proofs of finality advancement and finality violation look like?
- [ ] IF: What minimal information does a node need to store (beyond the blockchain) to assist light clients / nodes to sync from genesis and assist light clients to generate cryptographic proof of provable faulty behavior?
- [ ] IF: What minimal information must a light client keep to generate cryptographic proof of provably faulty behavior?
- [ ] IF: What minimal information must a light client keep to generate cryptographic proof of finality advancement?
### Mathematical Proofs
- [ ] IF: Proof sketch of safety (with finalizer set transitions)
- [ ] IF: Proof sketch of liveness
- [ ] IF: Proof sketch of availability of data to a light client to generate cryptographic proof of provably faulty behavior
arhag commented 1 month ago

Answers to critical design questions (part 1)

When does the new finalizer set become active?

The finalizer set is captured in a finalizer policy. A finalizer policy, assuming it is still being tracked, can be in one of the following three states: proposed, pending, or active.

One can think of a finalizer set as being "active" for a block if the finalizer keys in the set are allowed to (and in fact expected to) vote on the block. A finalizer is allowed to vote on a block if it belongs in either the set captured in the active finalizer policy of the block or the set captured in the pending finalizer policy of the block (assuming the block has a pending finalizer policy). It is possible for the same finalizer key to exist in both the active and pending finalizer policies.

The status of a finalizer policy gets promoted from proposed to pending to active. A block has exactly one active finalizer policy. It can also have zero or one pending finalizer policy. Also, it can track zero or more proposed finalizer policies.

Consider the following chain of blocks:

       ... <- B1 <- B2 <- B3 <- B4 <- B5 <- B6 <- B7 <- B8 <- B9
Proposed:           |<------ P1 ----->|                   
                          |<--- P2 -->|                   
                                |<------ P3 ----->|        
Pending:                              |<--- P2 -->|<--- P3 -->|
Active:   <------------------- P0 --------------->|<--- P2 -->|<--- P3 --->  

The active policy starting prior to B1 is P0. There are no existing pending or proposed policies tracked as of block B1. No finalizer policy is proposed in block B1. Finalizer policy P1 is proposed in block B2. Finalizer policy P2 is proposed in block B3. Finalizer policy P3 is proposed in block B4. No finalizer policies are proposed in blocks B5 to B9.

The last final block as of each of the following blocks is as listed:

Policy P1 is in the proposed state between blocks B2 and B4 (inclusive). Policy P2 is in the proposed state between blocks B3 and B4 (inclusive). Policy P3 is in the proposed state between blocks B4 and B6 (inclusive).

At block B5, policy P2 is promoted from proposed to pending and policy P1 is no longer tracked. Policy P2 is the pending state between blocks B5 and B6 (inclusive).

At block B6, normally the proposed policy P3 should be promoted to pending. However, because there already exists pending policy P2 which is not ready yet to be promoted to active, the promotion of P3 to pending cannot yet occur.

At block B7, pending policy P2 is promoted to active (replacing P0). This opens up the space for proposed policy P3 to now be promoted to pending as well in the same block. Policy P3 is in the pending state between blocks B7 and B8 (inclusive). Policy P2 is in the active state between blocks B7 and B8 (inclusive).

At block B9, pending policy P3 is promoted to active (replacing P2). Policy P3 remains the active policy for block B9 and an indeterminate number of blocks descending from B9.

For blocks B1 to B4, a single policy was in effect: active policy P0. For blocks B5 and B6, a joint policy was in effect: active policy P0 together with pending policy P2. This means finalizers from both P0 and P2 should vote on blocks B5 and B6. For blocks B7 and B8, a different joint policy was in effect: active policy P2 together with pending policy P3. Finally, in block B9, a single policy was in effect: active policy P3.

What finalizer behavior is considered provably faulty?

There are only three rules that finalizers must not violate. If they violate these rules, then cryptographic proof may be submitted against them which proves that they violated one of these three rules. If they do not violate these rules, then they will not contribute towards a finality violation and will not generate any evidence that can be used to form a cryptographic proof of rule violation against them.

In the rules below, convenient notation will be used to refer to blocks and their attributes.

Blocks will be referenced by symbols such as b, b1, b2, etc.

Finalizers will be referenced by symbols such as f, f1, f2, etc.

The timestamp of a block is given by the function t(b); for example, the timestamp of block b1 is t(b1).

The finality digest of a block is given by the function d(b).

The block referenced as the latest QC claim of a block b is given by the function c(b). For example, consider a block b1 and a child block b2 that claims the QC on b1; then, c(b2) == b1.

The parent block of a block b is given by the function p(b). We use the notation A(b1, b2) to represent the proposition that b1 is an ancestor of b2, which is to say that b1 == p(p(...p(b2))) for some composition chain of the p function (could just be a chain of length 1).

Given a finalizer f, we use the notation S(b, f) to represent the proposition that there exists a strong vote by finalizer f on block b and we use the notation W(b, f) to represent the proposition that there exists a weak vote by finalizer f on block b. Finally, V(b, f) represents the proposition that there exists some vote (weak or strong) by finalizer f on block b; so V(b, f) = S(b, f) or W(b, f).

Rule 1: Avoid voting on the duplicate timestamps

Rule 1 is violated by a finalizer if there exists two distinct blocks that the finalizer has voted on each and where both blocks have the same timestamp.

In mathematical notation: Finalizer f has violated rule 1 iff there exist distinct blocks b1 and b2 where:

Finalizers can avoid violating rule 1 by ensuring a monotonicity condition is satisfied before they vote. This requires tracking a record per finalizer that is persisted after each vote. This record, called the finalizer safety information, must include the timestamp of the latest block that was voted on which is an input into the monotonicity condition. Before voting on a block b, the finalizer must ensure that t(b) is strictly greater than the timestamp of the last block it voted on which is stored in its finalizer safety information; if that is the case, then the monotonicity condition is satisfied and does not act as a barrier to voting on block b.

The monotonicity condition provides a stricter constraint than simply avoiding voting on the same timestamp twice. The stricter condition allows avoiding rule 1 violation with a small fixed amount of storage in the finalizer safety information. But more importantly, it ensures the order in which a finalizer votes on blocks respects the order of the timestamp of the blocks. This will be important constraint in ensuring the other rules are also respected. In fact, going forward when an order of voting by finalizers is described (using words such as prior, later, before, after, then, etc.), one can assume the timestamps of the blocks that were voted on are also ordered appropriately because of the monotonicity condition.

Rule 2: Respect time interval commitments implied by strong votes

Rule 2 is violated by a finalizer if it strong votes on a block where the time interval implied by the strong vote conflicts with the timestamp of any blocks that the finalizer has voted on prior and that are not ancestors of the block that was strong voted on. The implied time interval is a left-open interval where the right endpoint is the timestamp of the block and the left endpoint is the timestamp of the block referenced by the latest QC claim of the block that was strong voted on.

In mathematical notation: Finalizer f has violated rule 2 iff there exist distinct blocks b1, b2, and b3 where:

In the above, the time interval (t(b1), t(b3)] is implied by the strong vote on b3. But the commitment to that time interval would not be respected by the finalizer if it strong voted on b3 after voting on b2, since t(b2) is contained in that interval and yet b2 is not an ancestor of b3. It would not be respected either if it voted on b2 after strong voting on b3, but the finalizer should avoid doing that anyway if it is respecting the monotonicity condition that arose from trying to avoid violating rule 1.

Finalizers can avoid violating rule 2 in addition to rule 1 above if the finalizer safety information introduced earlier was expanded to include other relevant past voting data that helps the finalizer determine whether it is allowed to strong vote or not. If this additional rule prevents it from strong voting (which comes with the burden of the additional time interval commitment which must be respected by the finalizer's prior votes), it may still be able to weak vote (since that does not come with the additional commitment burden to respect rule 2).

If the latest strong vote by a finalizer was tracked, then any weak votes after that and prior to the next strong vote become relevant in restricting whether that next strong vote is allowed by rule 2.

The latest strong vote also remains relevant. If it is for a block with a timestamp outside the interval implied by the next strong vote or it is for a block that is an ancestor of the block that the next strong vote is on, then the latest strong vote does not act as a rule 2 blocker for the next strong vote.

Any weak votes after the latest strong vote that are outside the interval implied by the next strong vote can be safely ignored. All weak votes after the latest strong vote that are inside the interval implied by the next strong vote must be for blocks that are ancestors of the block that the next strong vote is on. If even one of those weak votes is for a block that is not an ancestor, then rule 2 blocks the next strong votes; though it can still allow the vote to be downgraded to a weak vote.

However, this approach of tracking votes, while it does not any additional strictness that can negatively impact liveness, does require tracking an unbounded number of weak votes in the finalizer safety information. We wish to keep the size of the finalizer safety information fixed per each finalizer. It is possible to do this while respecting rule 2 by respecting a slightly stricter rule.

Instead of tracking the latest strong vote and all the weak votes since then, it is sufficient to just track the latest vote (strong or weak) and an indicator of whether or not the latest strong vote and all weak votes after the latest strong vote are for blocks that are either block b or an ancestor of block b where block b is the block voted on by the latest vote. If the indicator states that statement is not true, then a stricter condition must be satisfied where the t(b) must be outside the time interval implied by the next strong vote (regardless of whether b is an ancestor of the block that the next strong vote is on). But if the indicator state that statement is true, then the exception where b is an ancestor of the block that the next strong vote is on is accepted as a way of respecting rule 2 even if t(b) lies within the interval implied by the next strong vote. Following this stricter rule does impact liveness slightly, but it is of a negligible impact given how block production and voting dynamics work in practice.

So the finalizer safety information should be expanded to include a block reference data (block number, timestamp, and finality digest) for the block that the latest vote was on as well as a special boolean called votes_forked_since_latest_strong_vote. That boolean can be initialized to false and is reset to false on a strong vote. If the boolean is true, then it remains true on a weak vote. Under certain conditions, a weak vote may change the value of that boolean from false to true. Specifically, if the block the new weak vote is on is not a descendant of block of the latest vote, then votes_forked_since_latest_strong_vote changes from false to true.

When evaluating whether rule 2 would be in violation (and therefore block strong votes) for a block b under consideration for voting, the first thing to lookup the block reference for the block of the latest vote (denoted br) and check is if t(br) <= t(c(b)). If that is true, then rule 2 is not in violation and no further checks are necessary to know that rule 2 does not prevent a strong vote. Otherwise, rule 2 would be in violation under a strong vote if votes_forked_since_latest_strong_vote is true or if A(br, b) is false.

Rule 3: Honor locks formed by strong votes

Rule 3 is violated by a finalizer if after locking onto a block due to a strong vote it votes on another block that conflicts with the lock. A block conflicts with a lock if is not a descendant of the locked block and its latest QC claimed block has a timestamp that is no greater than that of the locked block.

In mathematical notation: Finalizer f has violated rule 3 iff there exist distinct blocks b1, b2, b3, and b4 where:

In the above, b2 is the locked block which was formed by the strong vote on b3. Note that it is not true that b2 is ancestor of b4 (which otherwise would have satisfied what is called the safety condition) and it is not true that t(b2) < t(b1) (which otherwise would have satisfied what is called the liveness condition); therefore, since neither safety nor liveness conditions were satisfied, a vote by the finalizer on b4 after it strong voted on b3 would dishonor the lock. Therefore any vote on b4 (weak or strong) is disallowed by rule 3.

Finalizers can avoid violating rule 3 in addition to rules 1 and 2 above if the finalizer safety information from earlier was further expanded to track the information about the locked block (denoted bl). Before voting (weak or strong), the finalizer must make sure the vote would not violate rule 3 according to the existing locked block information in the finalizer safety information. A vote on block b is disallowed by the finalizer due to rule 3 if both the safety condition (A(bl, b) is true) and the liveness condition (t(bl) < t(c(b))) are not satisfied. Furthermore, any time a finalizer strong votes on a block, the locked block information in the finalizer safety information file may need to be updated; the locked block should be updated to c(b) on a strong vote on block b if t(bl) < t(c(b)). Note that it is possible for the liveness condition to not be satisfied but the safety condition to be satisfied; in this case, necessarily, t(c(b)) < t(bl) and A(c(b), bl) is true, and therefore it makes sense that the locked block should not be updated since bl remains a better lock to keep than c(b).