0chain / 0chain

Züs (formerly 0Chain) is a decentralized blockchain-based storage platform with no vendor lock-in and a 3-layer security - fragmentation, proxy re-encryption, and immutability. It has close to wire speed data performance, free reads, and is ideal for apps as well as backups, AI data, disaster recovery.
https://zus.network
Other
115 stars 51 forks source link

Restarted cluster got stuck #634

Closed peterlimg closed 2 years ago

peterlimg commented 2 years ago

After restarting a cluster to update docker image, the chain got stuck. Because miners are all failing to compute state to verify received blocks. It failed to get the state changes of previous blocks.

2021-11-03T00:02:05.820Z        ERROR   chain/protocol_block.go:682     sync_block - sync state changes failed  {"round": 92649, "num":
 1, "error": "get block state changes: block_state_change_error: error getting the block state change"}
0chain.net/chaincore/chain.(*Chain).syncPreviousBlock
        /0chain.net/chaincore/chain/protocol_block.go:682
0chain.net/chaincore/chain.(*Chain).syncBlocksWithCache
        /0chain.net/chaincore/chain/protocol_block.go:634
0chain.net/chaincore/chain.(*Chain).syncPreviousBlock
        /0chain.net/chaincore/chain/protocol_block.go:666
0chain.net/chaincore/chain.(*Chain).syncBlocksWithCache
        /0chain.net/chaincore/chain/protocol_block.go:634
0chain.net/chaincore/chain.(*Chain).SyncPreviousBlocks
        /0chain.net/chaincore/chain/protocol_block.go:608
0chain.net/chaincore/chain.(*Chain).GetPreviousBlock
        /0chain.net/chaincore/chain/protocol_block.go:508
0chain.net/chaincore/block.(*Block).ComputeState
        /0chain.net/chaincore/block/entity.go:771
0chain.net/chaincore/chain.(*Chain).computeState
        /0chain.net/chaincore/chain/state.go:68
0chain.net/chaincore/chain.(*Chain).ComputeState.func1
        /0chain.net/chaincore/chain/state.go:37
0chain.net/chaincore/chain.(*Chain).ComputeBlockStateWithLock
        /0chain.net/chaincore/chain/worker.go:698
        /0chain.net/chaincore/block/entity.go:771
0chain.net/chaincore/chain.(*Chain).computeState
        /0chain.net/chaincore/chain/state.go:68
0chain.net/chaincore/chain.(*Chain).ComputeState.func1
        /0chain.net/chaincore/chain/state.go:37
0chain.net/chaincore/chain.(*Chain).ComputeBlockStateWithLock
        /0chain.net/chaincore/chain/worker.go:698
0chain.net/chaincore/chain.(*Chain).ComputeState
        /0chain.net/chaincore/chain/state.go:36
0chain.net/miner.(*Chain).AddToRoundVerification
        /0chain.net/miner/protocol_round.go:721
0chain.net/miner.(*Chain).HandleVerifyBlockMessage
        /0chain.net/miner/protocol_receive.go:145
0chain.net/miner.(*Chain).BlockWorker.func1
        /0chain.net/miner/worker.go:60
2021-11-03T00:02:05.820Z        ERROR   block/entity.go:774     compute state - previous block not available    {"round": 92651, "block
": "e8a31499c3d5ea9189abc422ac9685d1bebbd712ae03130f72408f6cfe3dd2a6", "prev_block": "7c79289435d155739b8de9dbabdbed88a462d8183c4fac118
033294799b503ae"}
peterlimg commented 2 years ago

Shutting down miners all at once is the problem perhaps, when they were force closed, the state of last finalized block may not be persisted to db yet . While sharders do have persisted those state changes, and even one round ahead. So after all miners were restarted, they will fetch the latest finalized block from sharders, and try to start the block from it, but new blocks could not be verified due compute state failed error described above.

We can fix this by adding an endpoint to sharder for getting state changes. Currently, we can only fetch state changes from miners.