PlatONnetwork / PlatON-Go

Golang implementation of the PlatON protocol
GNU Lesser General Public License v3.0
448 stars 144 forks source link

The node restart failed and start successfully after deleting the wal directory. #1269

Closed zjsunzone closed 4 years ago

zjsunzone commented 4 years ago

Hi there,

The node restart failed and start successfully after deleting the wal directory. From the phenomenon, it seems that it may be stuck when executing the transaction.

System information

PlatON version: platon version OS & Version: Windows/Linux/OSX Commit hash : (if develop)

Steps to reproduce the behaviour

Restart consensus nodes in the network.

Backtrace

DEBUG[03-18|16:12:14.980|core/snapshotdb/snapshotdb.go:336]  SetCurrent                               package=snapshotdb base=&{Num:+340970} height="&{Num:+341068 Hash:[141 95 204 175 154 86 194 65 247 97 91 216 56 71 3 18 56 227 187 200 194 147 228 137 72 214 51 110 39 2 38 22]}"
INFO [03-18|16:12:14.980|miner/worker.go:246]                CommitDuration in Millisecond            commitDuration=9500000
INFO [03-18|16:12:15.122|core/blockchain_reactor.go:51]      Init BlockChainReactor ... 
INFO [03-18|16:12:15.122|node/crypto_handler.go:46]          Init CryptoHandler ... 
DEBUG[03-18|16:12:15.122|eth/backend.go:264]                 Validator mode                           mode=ppos
DEBUG[03-18|16:12:15.122|miner/worker.go:395]                Clear Pending                            number=341068
INFO [03-18|16:12:15.122|x/plugin/staking_plugin.go:80]      Init Staking plugin ... 
INFO [03-18|16:12:15.269|x/plugin/slashing_plugin.go:65]     Init Slashing plugin ... 
INFO [03-18|16:12:15.269|x/plugin/reward_plugin.go:73]       Init Reward plugin ... 
INFO [03-18|16:12:15.269|x/plugin/restricting_plugin.go:54]  Init Restricting plugin ...              package=RestrictingPlugin
INFO [03-18|16:12:15.269|x/plugin/gov_plugin.go:46]          Init Governance plugin ... 
INFO [03-18|16:12:15.269|x/gov/gov_params.go:42]             Init Govern parameters ... 
INFO [03-18|16:12:15.269|consensus/cbft/cbft.go:215]         ~ Start cbft consensus 
DEBUG[03-18|16:12:15.269|consensus/cbft/network/handler.go:690] ~ Start synchronize in the handler 
DEBUG[03-18|16:12:15.276|consensus/cbft/validator/validator.go:288] Update validator                         validators="{\"validateNodes\":{\"01dbc29819bafc35b4d0c4ddd9203f397f0ed664ae5da3d07dc592f12afcdf1223f41e7d750a490c0e3b87b1f8444e43dcdbedde2420c9342e0370ce80464f48\":{\"index\":4,\"address\":\"0x9584354a04f31a91eac2d578403ceaa9ed17165f\",\"nodeID\":\"01dbc29819bafc35b4d0c4ddd9203f397f0ed664ae5da3d07dc592f12afcdf1223f41e7d750a490c0e3b87b1f8444e43dcdbedde2420c9342e0370ce80464f48\",\"blsPubKey\":\"d911363f5c9c75fa8cd6daae062c7eacf22854f00a671edbaee664eea3e6357b7dbffade1ea1880cf825d9c2b810ae0210704f30d1c5763677a77508f36ac091bda4d26d6f2062151899903f9a85b8a2fdf236253a7d6834763af353a0af2b06\"},\"1cb04dec15fe89fda7297c1690e6e5df8d8ffd700ffd40f051987e47881fbb97b5ae816e40818102cdd94ce4b4b79d2ab2a8e758189f583595faa1eb43dd341e\":{\"index\":0,\"address\":\"0x71f7b8b96a46a0350998361b0f8c2414d42415d5\",\"nodeID\":\"1cb04dec15fe89fda7297c1690e6e5df8d8ffd700ffd40f051987e47881fbb97b5ae816e40818102cdd94ce4b4b79d2ab2a8e758189f583595faa1eb43dd341e\",\"blsPubKey\":\"d9815eb13d0318cee4fe5a2cf2f67ceb249efd2d5a5c1034d41ff920b6114a7cc4ef99b7908e1fea241c598f056d940d851c2b6d64779a6c1b8b6fa313bfa474178b5cb8a4f6be892b7b0e7b426dba32b164f57febd8d311e985ae861ceb4b03\"},\"4ed348bd1afedb117c336b2549ab0d581ff788fe434cccd6f0d7f9e48da5305a9db4e992980532e7654f064a823005e9cdf2e679dd7922e88a0157745e392d6f\":{\"index\":9,\"address\":\"0xbdca35569e5dc71d404f4233d90d01ef2f21bc6f\",\"nodeID\":\"4ed348bd1afedb117c336b2549ab0d581ff788fe434cccd6f0d7f9e48da5305a9db4e992980532e7654f064a823005e9cdf2e679dd7922e88a0157745e392d6f\",\"blsPubKey\":\"3401ab41fc505cc7b998553a46b60fcaa600d7c11a0ecf52f224bcfcdd586ff9e3ac015cee3e9179a3fda83f54ea59003ec42d6155af9d559b3b90424f4936064ad6dc65307824e233db17bb41617f9508dbe803dddaf33f1c7db9097195f008\"},\"7e51976bf79dd8cdf4a807a81f3a39f4c2e1a4538236f656a774673220fe384de320cf7961d366a7b22585e584368364c228763fd57638ff64811bae01c0c996\":{\"index\":3,\"address\":\"0xcef775ca944e90c02aa359d58935b6e63255a7f2\",\"nodeID\":\"7e51976bf79dd8cdf4a807a81f3a39f4c2e1a4538236f656a774673220fe384de320cf7961d366a7b22585e584368364c228763fd57638ff64811bae01c0c996\",\"blsPubKey\":\"498b63c199841a8d315f8d2dc2cae2b7401b0004b87454d45af4da33918519a21afe67a70c810aa46947bb525296f703d0dc52c3188cb1e8635ebb637413583b298ac6f0f55eb0739e1c166dbcd7d2623fd9b7031b2f6b9917c8dbf10d6b4a12\"},\"831a62492ddb9bec025eba01ca2d6e9667a97b6685a648509ca83793ded9d3698ebc04a56fc513bcfd4d8172df4b16f1c2d27554cb428c8c0729e7214beb1d56\":{\"index\":1,\"address\":\"0x88566eeeb370dd0741c62240517541ca7144c2f7\",\"nodeID\":\"831a62492ddb9bec025eba01ca2d6e9667a97b6685a648509ca83793ded9d3698ebc04a56fc513bcfd4d8172df4b16f1c2d27554cb428c8c0729e7214beb1d56\",\"blsPubKey\":\"fde5f006d78cbc4b10d6565e9e7af00c6ca22939ac0f71926d5ab837877ac4f13ca150a1be84aabc7e40246974497110c2db6af8ff6cb2be6cfa49d6a399cb009dd587469e13bc42f43a6183d4f723d4e4267583cb63fdbf03ed5239f24d6a93\"},\"cb8732c3f41231da526a6b289d02e22edecfc8a6b6f4010d9874ebd4f345ed461489fa774350fc169fe49c96a63b79e2c0c92e2a73a573258a2702cb2b1108c8\":{\"index\":6,\"address\":\"0xe52c10253d4619db65f1efaabb1baafa00e6d9fe\",\"nodeID\":\"cb8732c3f41231da526a6b289d02e22edecfc8a6b6f4010d9874ebd4f345ed461489fa774350fc169fe49c96a63b79e2c0c92e2a73a573258a2702cb2b1108c8\",\"blsPubKey\":\"d0fd718b5b466311beb69e3e864b5baf9ed9916e7cdf946847e568bec8fac1db6a33f133e97156e4f9da144fbe557c1997b21783009882443807c904d1b86cd33f0a8b6482321ca5884c076d6f0a3dd5a97ea9ff2ddcd1c13a477095a172e482\"},\"d511b72541e6bf2f636bb43d84c94af5c9685409a7fc1cffe406cad8349e00bb489fc539476a4dd97c052ac6f4b1e4f84a7271b7c52c32f7c232609f7c568ee9\":{\"index\":2,\"address\":\"0x0fbcfc411e89e335bb0f49017e820d9cb9036bde\",\"nodeID\":\"d511b72541e6bf2f636bb43d84c94af5c9685409a7fc1cffe406cad8349e00bb489fc539476a4dd97c052ac6f4b1e4f84a7271b7c52c32f7c232609f7c568ee9\",\"blsPubKey\":\"6a7d2a79eace99ab9382e3fe2d9c841a07e78b241cdc848fabece84f7f21b72f14ee868798043fcb5d7bbb1d3c147d0fce186919b8bdd038ed9e8b15bccbe0b2639517972dcbf0e945a98117c27139d3f3f0b5e20b6a73be7d5f01ef84be7e83\"},\"e9bcedb842cb2912aa2434fc71487cea684bb0d73a156926b623d1f9bad982e39800f5ced0f70bf7c99ae63705ccc01748a719a3328ab7e483ab48f6b7ef6902\":{\"index\":5,\"address\":\"0x3beff381766c41c65ae06ac435df669f3a1df6c0\",\"nodeID\":\"e9bcedb842cb2912aa2434fc71487cea684bb0d73a156926b623d1f9bad982e39800f5ced0f70bf7c99ae63705ccc01748a719a3328ab7e483ab48f6b7ef6902\",\"blsPubKey\":\"2e9c0035917621af0b122b75bcdb70bf09e1ed39f93e48c29830f6f90d07984b5d7d0258bf02b8391da7b4649f79cb00a8e0929bee255c2fd3b3ba5b3c1b0aebe2a7e63857b2f9b11a37bb076d38318d908b62943da6f5609eb5ece5e8264e99\"},\"eb0d000a552770038185ea0b92890445027baa45e156e5935d0026db29fd75d3836b11921ac5a08c7cab28fa16d1a4f3ec949cf4a6e15cb93c4b97e470798cd2\":{\"index\":7,\"address\":\"0x41ee1c8c3bbc32389a7c86232abff0024e6fd69c\",\"nodeID\":\"eb0d000a552770038185ea0b92890445027baa45e156e5935d0026db29fd75d3836b11921ac5a08c7cab28fa16d1a4f3ec949cf4a6e15cb93c4b97e470798cd2\",\"blsPubKey\":\"f4d236ca785507f90cee88f285e12c4d383399c79291797af6535c45e6a018e3cc61fbcddd244b70ab8c3b8c3627b70c1f7a00aceca7f2efc5c0d2a606a83ae383e2ab804932436160249a7802c16b842b2e9e7a2f383b8afd493db5eef0290f\"},\"f9a00207ad1d7a07654386836223039bfee05324d923ecd2cbe5283d71b55861e13bdbcc151763fad18344c4f75e45e988606a760b96eb51088ce3f0e7241981\":{\"index\":8,\"address\":\"0x3772370cb249c61ed89217e1603ddd78bac41769\",\"nodeID\":\"f9a00207ad1d7a07654386836223039bfee05324d923ecd2cbe5283d71b55861e13bdbcc151763fad18344c4f75e45e988606a760b96eb51088ce3f0e7241981\",\"blsPubKey\":\"da37ef7d7114b93db350a622b5de54ae4ffdc4e3ec50277f7d09a33f4b510c0208d14805fc2ede4763d4be11d97d6007f215fbb1de1bf57c60ba2fcb83731636a675247ef185a8e3382f877df303c717a9a62b5ddee160fc5ef2872e228a9812\"}},\"validateBlockNumber\":341001}" switchpoint=341000 epoch=1365 lastNumber=341250
DEBUG[03-18|16:12:15.277|core/blockchain_cache.go:225]              Clear cache                              baseBlockHash=8d5fcc…022616 baseBlockNumber=341068
DEBUG[03-18|16:12:15.277|core/tx_pool.go:448]                       Reset rollback block                     hash=8d5fcc…022616                                                      number=341068 rollback=0
DEBUG[03-18|16:12:15.277|consensus/cbft/network/handler.go:215]     PartBroadcast message to sendQueue       msgHash=277537…7f8b28 BHash=000000…000000 msg={Epoch:1365,ViewNumber:7,BlockIndex:0}
DEBUG[03-18|16:12:15.277|consensus/cbft/sync_process.go:813]        Send GetPrepareBlock by part broadcast   msg={Epoch:1365,ViewNumber:7,BlockIndex:0}
INFO [03-18|16:12:15.277|consensus/cbft/consensus_process.go:759]   Success to change view, current view deadline epoch=1365 view=7 deadline=2020-03-18T16:12:25+0800
DEBUG[03-18|16:12:15.277|consensus/cbft/consensus_process.go:593]   Try change view failed, had receive viewchange epoch=1365 view=7 len=0 view={Epoch:1365,ViewNumber:7}
DEBUG[03-18|16:12:15.277|consensus/cbft/network/router.go:205]      consensusNodes in kMixingRandomNodes     cNodes=10 ids=831a62492ddb9bec,d511b72541e6bf2f,7e51976bf79dd8cd,e9bcedb842cb2912,cb8732c3f41231da,eb0d000a55277003,1cb04dec15fe89fd,01dbc29819bafc35,f9a00207ad1d7a07,4ed348bd1afedb11
DEBUG[03-18|16:12:15.277|consensus/cbft/network/router.go:233]      kMixingRandomNodes select node           msgHash=277537…7f8b28 cNodesLen=10 ncNodesLen=0 peerSetLen=0
DEBUG[03-18|16:12:15.277|consensus/cbft/network/router.go:110]      Gossip message                           msgHash=277537…7f8b28 msgType=*protocols.GetPrepareBlock msg={Epoch:1365,ViewNumber:7,BlockIndex:0} targetPeer=
INFO [03-18|16:12:15.277|consensus/cbft/wal/wal_database.go:73]     Allocated cache and file handles         Wal_database=/data/platon/data/platon/wal/wal_meta cache=16  handles=16
INFO [03-18|16:12:17.960|consensus/cbft/wal_bridge.go:258]          Recover chain state from wal             epoch=1365 view=7                         chainState="[commitState:[blockNumber:341068, blockHash:0x8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616, qcNumber:341068, qcHash:0x8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616], lockState:[blockNumber:341069, blockHash:0x46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74, qcNumber:341069, qcHash:0x46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74], qcState:[blockNumber:341070, blockHash:0x6828138e905f90540459c09cd340533c56aa6a703db616f965f1c8f1fa975f1a, qcNumber:341070, qcHash:0x6828138e905f90540459c09cd340533c56aa6a703db616f965f1c8f1fa975f1a]]"
INFO [03-18|16:12:17.960|consensus/cbft/wal_bridge.go:457]          Non contiguous block                     sNumber=341068 sParentHash=e8a724…3a3a78 pNumber=341068 pHash=8d5fcc…022616
DEBUG[03-18|16:12:17.960|core/blockchain_cache.go:273]              Start execute block                      hash=46cebf…9e3e74                                                      number=341069 sealHash=5bbfa1…8307d0
INFO [03-18|16:12:17.960|core/blockchain_cache.go:198]              Make stateDB                             hash=8d5fcc…022616                                                      number=341068 root=4b0a72…4c90c4
DEBUG[03-18|16:12:17.960|x/handler/vrf_handler.go:98]               Verification block vrf prove             current blockNumber=341069 current hash=46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74 parentHash=8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616 proof=0293f8d0d7d00f7b01cf3eeabbec5142521f0208e0b68ebe6a1dfb845ae41f9be8c237d4f3e3915e43baf70a17e3f667ba93d06f5568a57e1d574d64787d167eeec91f42796cca25496139a66e4ff2ef13
DEBUG[03-18|16:12:17.960|x/handler/vrf_handler.go:184]              Get the nonce of the previous block      blockNumber=341069 parentHash=8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616
INFO [03-18|16:12:17.962|x/handler/vrf_handler.go:116]              Vrf proves successful verification       current blockNumber=341069 current hash=46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74 parentHash=8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616 proof=0293f8d0d7d00f7b01cf3eeabbec5142521f0208e0b68ebe6a1dfb845ae41f9be8c237d4f3e3915e43baf70a17e3f667ba93d06f5568a57e1d574d64787d167eeec91f42796cca25496139a66e4ff2ef13 input=37645409395a90b831c821f8cca05c4b7e5fd8a3d98bdd49bbf59a94ed4f2ae1
DEBUG[03-18|16:12:17.962|core/blockchain_reactor.go:246]            Call snapshotDB newBlock on blockchain_reactor blockNumber=341069 hash=46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74   parentHash=8d5fccaf9a56c241f7615bd83847031238e3bbc8c293e48948d6336e27022616
INFO [03-18|16:12:17.962|core/snapshotdb/snapshotdb.go:559]         NewBlock                                 package=snapshotdb        num=341069 hash=46cebf…9e3e74                                                      parent=8d5fcc…022616
DEBUG[03-18|16:12:17.962|x/plugin/slashing_plugin.go:212]           Call setPackAmount finished              blockNumber=341069 blockHash=46cebf…9e3e74                                                      nodeId=eb0d000a55277003 value=7
DEBUG[03-18|16:12:17.963|core/blockchain_reactor.go:265]            BeginBlock StateDB root, end             blockHash=0x46cebf67e251a19ef5c082a14f0fd476fedfc38595422b0fef326ba9919e3e74 blockNumber=341069 root=0x4b0a7239f32405fad045d1d07c62aa3037235f6cd47d8b826b4ab9057a4c90c4 pointer=0xc000236120
DEBUG[03-18|16:12:20.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:24.270|trie/database.go:392]                      Database node                            hash=f56833…454dff
DEBUG[03-18|16:12:25.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:30.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:35.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:37.634|core/blockchain.go:315]                    Load state cache                         count=89 limit=67107840 duration=30.303241768s
DEBUG[03-18|16:12:40.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:45.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:50.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:12:55.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:13:00.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
DEBUG[03-18|16:13:05.269|consensus/cbft/network/peer.go:602]        The neighbor node owned by the current peer is : {}, size: {0} 
niuxiaojie81 commented 4 years ago

When wal executes the block playback transaction, the contract transaction calls the GetBlockByHashAndNum function, but this function also relies on the asyncCallCh channel of cbft. At this time, cbft has not started to complete, causing the cbft coroutine to get stuck.

zjsunzone commented 4 years ago

Testing with the latest code is fine.