Open raulk opened 3 months ago
The topdown finality struct is proposed to add validatorChanges
and crossMesages
so it's:
struct ParentFinality {
height: u64,
hash: Hash,
validator_changes: Vec<ValidatorChange>,
messages: Vec<IPCEnvolope>
}
The topdown syncer still works as the current way. When a ParentFinality
is prepared, it is instead written to a persisted data store, at the same time, the CID of the finality will also be calculated.
The CID of the proposal will be signed by the current validator and published to the gossip channel, a voting process will be done in the gossip channel. If a quorum cannot be reached, the quorum will be purged and voting restarts.
If the quorum is reached, then the proposal will be made by fendermint to cometbft with the following:
struct TopdownProposal {
content: ParentFinality, /// this can be compressed
proof: MultiSigProof
}
Validating the proposal no longer requires checking the RPC, but instead validating the content matches that of the cid and querying the validator list of the corresponding height and makes sure the quorum is indeed formed.
Execution of the proposal is the same as the current flow
Once the proposal is executed, the corresponding topdown persisted storage will be purged
Alternatively, the above vote tally can happen completely on chain, i.e. in fendermint, each validator will submit the ParentFinality
to the contract and vote tally is calculated on the contract.
For pain point 1, the ultimate solution is light client. Since we don't have it for now, then voting that can be reverted should be able to handle restarts. At the same time, only validators need to run the topdown syncer instead of every single node. Each validator could also connect to a different RPC for diversity.
For pain point 2,3 and 4, it should be obvious because RPC is no longer needed when the proposal is finalised.
When a validator detects a new parent view, it prepares the ParentFinality
struct. It calculates the CID of the ParentFinality
and signs the message. It publishes the finality height, CID, public key and signature to the gossip channel.
Upon receiving a gossip vote, the validator will check if it has already voted. If yes, the vote is ignored. If no, the validator will validate the signature is correct. Then it looks up the height and aggregate signature. It also tracks the validator has voted in a bitmap where each bit represents if the validator has voted or not, i.e. 1 means yes and 0 means no. The validator will calculate if a quorum is reached. If yes, the MultiSigProof
is created.
struct MultiSigProof {
voted: BitVector,
aggregated_signature: Bytes,
}
Upon receiving the proposal, the validator needs to:
ParentFinality
received as the signature message.After some initial implementation of topdown commitment, #1037, here are a few initial ideas on how to implement commitment into vote tally.
The current vote tally works as follows, the topdown syncer will pick up new parent view from parent RPC. When there is a new parent view, it's stored in vote tally. There is a background process that wakes up every few seconds, check the latest parent height tracked in vote tally and makes signs the corresponding data and publish the vote for that height.
So there are a couple of things to consider here:
One way to address the above issues is to enforce incremental progression. If the previous height to vote has not formed a quorum, vote tally will wait till the previous height has reached quorum. Parent finality is divided into max steps, each step is perhaps, say, 100 blocks or size_of(proposal) > 1kb
. The step size might need to be contained so as not to blow up the block size. The previous quorum reached height say, it's 1000, then next height to vote is within 1100. Each validator will not vote beyond. If a quorum cannot be reached within 1100 (different RPC response, too big a proposal), then it's voided and restart.
Current code base can be reorganized into two major components:
trait ParentSyncer{
async fn poll_next_block(..);
async fn purge_blocks(heights);
}
trait TopdownConsensus {
// called by fendermint interpreter
fn sealed_quorum_proposal() -> Option<Proposal>;
fn check_quorum_proposal(proposal) -> bool;
fn quorum_advanced(height) -> bool;
fn next_proposal_height() -> Height;
fn new_vote_received(height, commitment);
async fn publish_vote(height);
}
The commitment is then the topdown messages + validator changes aggregated in that step. Each validator will just vote on the height + commitment. In cometbft, the commitment + parent view
will be included as a final sealed proposal.
One thing to note that using vote tally as a side channel comes with pros and cons compared to using a native actor for voting.
Hey @cryptoAtwill
The proposal sounds great! I just have a couple of clarifying questions:
If quorum can’t be reached, how does the parent get synchronized with the child? Given your proposal, we can have a limit, say 100 blocks in the child, to form quorum or drop the proposal. However, how does this get propagated to the parent since the parent is using different block times, etc.? Maybe we would need to notify the parent that the quorum wasn’t reached?
You mentioned that the top-down finality is stored by the current validator who receives the proposal. Is the proposal stored on the validator's machine?
Is it safe to assume that the purpose is also to have different validators using different RPC nodes (potentially more than one) to increase resilience? If that is the case, what happens if a single validator is using 3 RPC endpoints and they all report different states? I guess the sensible thing would be to just pick the one where he can see his peers' states too?
When a new validator joins the subnet network and wants to replay the chain, is it safe to assume they would completely skip the syncer and not replay the parent chain events since the quorum would never be formed again?
When the proposal passes the quorum and makes it to cometBFT, there is a point where the validator queries the last membership from the gateway to get the list of public keys of validators. Is there any chance that the list of current validators might have changed since the quorum was reached? For example, if the finality arrived from RPC on block 8000, then it took 80 blocks to reach the quorum on block 8080. Is there any possibility the validator set might have changed in the meantime?
If the vote tally is restarted, are the validators going to query the same RPC nodes? Maybe it would be beneficial to always have a fallback RPC parent node to try.
Thanks!
Hey @karlem
Thanks for the answers, just couple of notes:
I think we dont have to notify the parent, the parent does not really care about topdown. If no bottom up is received, that means the child has not settled yet.
- This makes sense, as long as the parent does not share the "uncommitted" state (not committed from bottom-up) as the current state of things.
Yeah, I think this is maybe something that we can recommend to validators. It is important to make them aware that if they all use the same RPC node, it's not without risk.
I think we are probably not talking about the same thing. Could you please help me understand what will happen if a new validator joins and replays the chain? Do they also replay the RPC from the parent, or just CometBFT? For example, if a validator joins on block 1000 a starts from snapshot but the network is on block 1500 and there already was 5 top down finality rounds and votes.
@karlem For 3, it's here: https://github.com/consensus-shipyard/ipc/blob/main/specs/topdown.md for topdown mechanism. With new design, there is no need for replay the RPC, because everything is on the blockchain.
if a validator joins on block 1000 a starts from snapshot but the network is on block 1500 and there already was 5 top down finality rounds and votes.
Then the validator needs to wait for the previous finality to be committed and join the list of validators.
@cryptoAtwill Yes cool. I was just double checking that this is the case. So when a new validator is catching up it won't consume the historical RPC events - because there is no need.
The updated proposal for topdown is broken down into several parts:
At the time of this writing, no changes to the existing parent syncer process.
There is a new versioned data struct that represent the vote being gossiped, named TopdownVote
.
/// The different versions of vote casted in topdown gossip pub-sub channel
pub struct TopdownVote {
version: u8,
block_height: BlockHeight,
/// The content that represents the data to be voted on for the block height
payload: Bytes,
}
Currently we only support version = 1
where payload = block_hash.extend(finality side effects commitment)
. The actual bytes that are voted on is the serialized version of TopdownVote
, i.e. fvm_ipld_encoding::to_vec(self)
.
To calculate the side effects commitment, the topdown finality struct now contains the cross messages and validator changes.
/// A proposal of the parent view that validators will be voting on.
pub struct ParentFinalityPayload {
/// Block height of this proposal.
pub height: ChainEpoch,
/// The block hash of the parent, expressed as bytes
pub block_hash: Vec<u8>,
/// The topdown messages to be executed.
///
/// Note that this is not the cross messages at the `height`,
/// but instead the cross messages since the last topdown finality to the current `height`.
pub cross_messages: Vec<IpcEnvelope>,
/// The validator changes to be applied
///
/// Note that this is not the validator changes at the `height`,
/// but instead the validator changes since the last topdown finality to the current `height`.
pub validator_changes: Vec<StakingChangeRequest>,
}
impl ParentFinalityPayload {
pub fn side_effect_cid(&self) -> Cid {
Cid::from(&self.cross_messages, &self.validator_changes)
}
}
When the parent syncer has pushed data to vote tally, the TopdownVote
will be signed by the validator private key and converted into SignedVote
.
/// The vote submitted to the vote tally
#[derive(Serialize, Deserialize, Debug, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
pub struct SignedVote {
pub(crate) payload: Bytes, // serialised TopdownVote struct
/// The signature of the signed content using the pubkey
signature: Signature,
pub(crate) pubkey: ValidatorKey,
}
The vote of each validator is published from a separated thread that runs the publish_vote_loop
. The high level idea is still the same as current main. It checks the latest height in vote tally and publishes the SignedVote
to the gossip-pubsub topic.
For vote listening, there is a gossip pub sub channel listening to incoming votes. The gossip pub sub checks the peer information and makes sure it's from the correct peer. Then, fendermint uses dispatch_vote
method to add the received vote to vote tally.
The SignedVote
received will have the signature checked and converted back to TopdownVote
. The topdown votes are collected into a sorted map:
OrdMap<BlockHeight, HashMap<TopdownVote, ValidatorSignatures>>
The ValidatorSignatures
is a collection of validator key with the signature received from SignedVote
.
/// A collection of validator public key that have signed the same content.
struct ValidatorSignatures {
validators: HashMap<ValidatorKey, Signature>,
}
The find_quorum
will be constantly called by cometbft to check if there are quorum formed. If there are more than 2/3
of the total weight validators have voted for a specific TopdownVote
, a quorum cert, MultiSigCert
, is created.
/// The ecdsa signature aggregation quorum cert for topdown proposal
pub struct MultiSigCert {
signed_validator_bitmap: BitVec,
agg_signatures: AggregatedSignature,
}
The validators are sorted by their weight followed by validator public key, so that they are ordered deterministically.
Cometbft will call prepare
abci method in fendermint, the prepare
method will check if there is a quorum cert. If there is one, the node will form the ParentFinalityPayload
then deduce the TopdownVote
. The topdown vote deduced will be checked against the quorum cert ballot. If they are matching, then the topdown proposal for cometbft is prepared: TopdownProposalWithQuorum
. It has the following definition:
struct TopdownProposalWithQuorum {
pub proposal: TopdownProposal,
pub cert: MultiSigCert,
}
enum TopdownProposal {
V1(ParentFinalityPayload),
}
In process
, it will check if the MultiSigCert
actually matches with the current power table by:
If the above two conditions are met, the proposal is accepted.
In deliver
, it's just the execution of ParentFinalityPayload
.