This approach was unsafe because it violated the non-equivocation properties of our BFT protocol.
A community member @ghostant-1017 also highlighted this issue in a bug bounty finding [HackerOne-2452182].
The ProposalCache (latest_round, pending Proposal, and SignedProposals) will be stored to a file on shutdown, loaded on bootup, and cleared with snarkos clean.
Ensures that honest validators do not create multiple proposals on the same round
Ensures that we do not sign additional validator proposals on the same round after a reboot.
Ensures that the node does not create any proposals prior to it's storage round prior to the reboot.
Periodic attempts to increment the round if we met the quorum requirements in storage.
In our current processing of PrimaryPings, we try to advance rounds in sync_with_batch_header_from_peer, however we perform this check PRIOR to processing the batch header, meaning it will not be included in the quorum calculation. This prevents us from advancing rounds when there are just enough certificates to meet quorum for that round (since we omit the last one we receive in the quorum calculation).
Note: An alternative to this latest_round approach is to store all of the certificates in Storage that have not made it into blocks to the ProposalCache file. The latest_round approach relies on PrimaryPings to advance the storage state, but does allow the ProposalCache file to be relatively small.
Implications
If enough (a majority) of validators reboot their nodes at the same time, there could be a halting case.
If we signed proposals past our ledger round, but all nodes have thrown away their Storage, then we won't have the signatures required to reconstruct the original proposal state.
This will need to be remedied by having all validators delete their proposal_cache file and reboot.
Validators looking to swap machines will need to migrate their proposal_cache file between machines to ensure honest behavior.
Test Plan
Unit tests have been added to for the new types and to ensure that the new checks hold.
Extensive local and burn-in testing will need to be performed.
Motivation
This PR has three changes to address https://github.com/AleoHQ/snarkOS/issues/3171:
Reverted proposal expiration introduced in https://github.com/AleoHQ/snarkOS/pull/3202.
The
ProposalCache
(latest_round
, pendingProposal
, andSignedProposals
) will be stored to a file on shutdown, loaded on bootup, and cleared withsnarkos clean
.Periodic attempts to increment the round if we met the quorum requirements in storage.
PrimaryPing
s, we try to advance rounds insync_with_batch_header_from_peer
, however we perform this check PRIOR to processing the batch header, meaning it will not be included in the quorum calculation. This prevents us from advancing rounds when there are just enough certificates to meet quorum for that round (since we omit the last one we receive in the quorum calculation).Note: An alternative to this
latest_round
approach is to store all of the certificates inStorage
that have not made it into blocks to theProposalCache
file. Thelatest_round
approach relies onPrimaryPing
s to advance the storage state, but does allow theProposalCache
file to be relatively small.Implications
If enough (a majority) of validators reboot their nodes at the same time, there could be a halting case.
Storage
, then we won't have the signatures required to reconstruct the original proposal state.proposal_cache
file and reboot.Validators looking to swap machines will need to migrate their
proposal_cache
file between machines to ensure honest behavior.Test Plan
Unit tests have been added to for the new types and to ensure that the new checks hold.
Extensive local and burn-in testing will need to be performed.
Related PRs
Reverts: https://github.com/AleoHQ/snarkOS/pull/3202. An extension of: https://github.com/AleoHQ/snarkOS/pull/3200
TODO