asonnino / hotstuff

Implementation of the HotStuff consensus protocol.
Apache License 2.0
115 stars 44 forks source link

Restart and Synchronize Issue #70

Open zicofish opened 2 years ago

zicofish commented 2 years ago

Hi, we have been using this library for a consensus scenario. But there seems to be some issues about restarting a node.

In our scenario, we run 4 nodes for consensus. Then we stop one of them for approximately 0.5~1 hours. Then we restart the node.

Afterwards, the node runs for a lot of synchronization blocks and gets stuck. Moreover, it finally drags down all other three nodes, and the whole system hangs.

What could be the problem and do u have a solution for this case?

Thanks~

zicofish commented 2 years ago

Btw, in addition, the database written out by hotstuff keeps growing, and there is no mechanism to remove old data. Is this expected?

asonnino commented 2 years ago

Sadly this codebase doesn't not implement crash-recovery (so there is no safe way to securely restart a node). To do so, we would need to persist a number of information to storage (eg. preferred round and last voted round).

asonnino commented 2 years ago

Regarding the database size, it is unclear how to solve it by only looking the validator's codebase. A typical solution is to clearly define the "active state" of the validator and cleanup everything else at epoch change (which is not currently implemented); then rely on some sort of "archival" nodes to persist the entire history of the blockchain. This however implies a blockchain ecosystem (which is beyond the scope of the consensus core).

zicofish commented 2 years ago

@asonnino Thanks. I have been using this library for a scenario that requires recovery, even after a long period shut down. I have already implemented something that should work. Perhaps I will post a PR after testing. :)