MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
2k stars 528 forks source link

Handle bootstrapping required local state sync edge case #4028

Open nholland94 opened 4 years ago

nholland94 commented 4 years ago

There is an edge case in checking local state sync requirements at the completion of bootstrap which we do not correctly cover right now. Below is my explanation of the edge case and how to address it, taken from a comment I left on #4027 (comment here https://github.com/CodaProtocol/coda/pull/4027#pullrequestreview-328519894).

Take for instance the scenario where we want to bootstrap to a root with is the last (or one of the last) blocks in epoch ep. The best tip will be in ep+1, but will be in the first k blocks in that epoch. Bootstrapping will dynamically retarget roots as the network advances. If bootstrapping takes too long to download the root snarked ledger, we could end up finishing bootstrap at some root which is within the first k blocks of ep+1, but is not the first block. If that happens, the required state sync check would still return that no sync is required since the best seen block is also within the first k blocks. In order to fix this, we need to pass both the root consensus state and the best consensus state into the require local state sync check function so that it can reason about the case where state sync is not required immediately, but we do not have enough information in the frontier in order to actually record the information in the future. This will be an annoying changed because local state will need the ability to optional have a "future next epoch snapshot" which will be used in place of recording the real next epoch snapshot from the root. This will also mean a consensus hook will need to be called on every breadcrumb that is added and not just root transitions since we will need to migrate the local state pointers sometimes when the local state would go out of sync when adding a new breadcrumb to the frontier.

ghost-not-in-the-shell commented 4 years ago

1 thing I noticed is that once we change the required_local_state_sync target to best_tip instead of root, if we propose before ledger catchup finishes, the required_local_state_sync check in proposer would fail and crash the node.

Possible solution would be prevent proposer from producing block if we are still doing catchup. This would also solve other kind of failures.

nholland94 commented 4 years ago

I think it would be unsafe to always prevent the proposer from producing a block during catchup, but I think we could make that a rule for the initial best tip ledger catchup we do after bootstrapping. Would be a little tricky since we would need to add the concept of a "required catchup" which would block the proposer while active.

Can you detail the case in which a proposer producing a block before ledger catchup completes and how it would cause the required_local_state_sync check to fail? I don't see, off the top of my head, how that would cause the check to fail, but I'm probably just not seeing what you are.

ghost-not-in-the-shell commented 4 years ago

Because bootstrap now use the peer's best_tip in required_local_state_sync, while if catchup didn't finish, then our best_tip = root, and in proposer, we do required_local_state_sync against our best_tip which is just the root.

emberian commented 4 years ago

We discussed this during the bootstrap meeting, we haven't recollected the exact edge case conditions but we need to stop crashing in the case @ghost-not-in-the-shell identified.