Closed ngtuna closed 5 years ago
Update more info:
Error: no validator in header
##############################
WARN [03-01|02:08:52] Synchronisation failed, dropping peer peer=347fbb1919dfed32 err="retrieved hash chain is invalid"
ERROR[03-01|02:09:01]
########## BAD BLOCK #########
Chain config: {ChainID: 88 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: <nil> Engine: posv}
Number: 2019528
Hash: 0x62eef880af5c4e005e2efb60e5ade953e52a60e1ce026e3aaafd19a56a439eff
I cannot find the logs that far back on my node, however, they were identical to what @thanhson1085 posted. Same block: 2019528.
When the node would reach this block during syncing, it would immediately drop all peers, then the node would reconnect to 1, then 2, then 3 peers, then it would drop back to 2, then back to 1, sometimes even 0. It would fluctuated between 0 and 3 peers, continually going up and back down in peer count. And it would never go past block 2019528.
This happened three times on one of my nodes. The only way that I was able to get beyond block 2019528 was to use a chaindata archive that was already past that block.
To test this, I recommend that you bring up a node and let it start syncing from 0. When it gets to block 2019528, it will probably stop. That would give you something to troubleshoot.
I've had the same issue with block 2019528. I started syncing from a snapshot in January and everything worked fine until I hit that block. The client would just drop all peers (stats page showed only 1 peer) and stop working. It keeps trying to download the block and fails every time, getting stuck in a loop. The only choice I have is to download a sync past that block.
Error: no validator in header
##############################
WARN [03-01|02:54:00] Synchronisation failed, dropping peer peer=64bda1653bf6da87 err="retrieved hash chain is invalid"
ERROR[03-01|02:54:18]
########## BAD BLOCK #########
Chain config: {ChainID: 88 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: <nil> Engine: posv}
Number: 2019528
Hash: 0x62eef880af5c4e005e2efb60e5ade953e52a60e1ce026e3aaafd19a56a439eff
Thanks @arthurk @djSiMuL . The peer info is important to us. Leave us more time to investigate the root cause. I believe you guys have passed it thanks to the snapshot.
@arthurk , @djSiMuL Could you please share your node information: Tomo version, Go version ?
@thanhnguyennguyen tomo/v1.2.2-stable/linux-amd64/go1.10.8
the nodes opened port 30303? @arthurk @djSiMuL
Try to run the script to add right peers:
#!/bin/sh
# get tomochain container id
container_id=$(docker ps -q -f "name=tomochain")
# remove all peers
echo "\n------------------------\n!! Removing all peers\n------------------------\n"
docker exec -t $container_id tomo attach data/tomo.ipc --exec "for (i = 0; i < admin.peers.length; i++) { admin.removePeer(admin.peers[i].id) }"
# add TomoChain peers
docker exec -t $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://df6a2423d01af3bf706a54417747553d02532d982eb5c612ef92d9e70e6b9eafe21afcfc27c7cd477e4fe2dcc0b46b73c977010811ede1c1dfca0944b8f310f6@35.202.169.170:30303')"
docker exec -t $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://c77e3b75a831b08eeac4ccdcc8b9085e357794b97e27eeee840c787b39d22b89d394e6796c6ff701cb14625a4c82c14e31f2162d2c6b9b000b6fd91c6f2b89dd@188.166.207.189:30303')"
docker exec -t $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://64bda1653bf6da87984c2b2bebcaa6811097a1ff87c15b3b421108ce82ed3dd8ed9e0be7c388a44c131b6b4934d2413c301f214e794d2d12b724d158878ba65f@104.248.98.59:30303')"
# restart node
echo "\n------------------------\n!! Restarting node\n------------------------\n"
tmn update
https://gist.github.com/thanhson1085/77be18f8110de00805d55513aa70aed5
tomo/v1.2.2-stable/linux-amd64/go1.10.8
Yes, port 30303 is open.
Since I was able to get passed this block on my good nodes, I will not bother running the good peers script. If I ever resync a new server and get stuck on that block again, I'll give this a try.
Yes the port is open but I've already downloaded a newer snapshot which included the bad block and started syncing from there, so I cannot test the script.
I put the final answer on telegram. It's good to close this issue now.
@djSiMuL @arthurkoziel and others getting the BAD BLOCK - Error: no validator in header issue, here is our answer after few days investing the issue: we figured out that due to the stability of the whole network all nodes reached to maximum capacity of 25 peers connected, thus some nodes coming late have very few connections. @djSiMuL and @arthurkoziel nodes had less than 3 peers at that time. The bad block was coming from those peers was rejected and those peers were disconnected; and the loop of connecting peer, rejecting bad block and disconnecting peer made them stuck. Other peers which have good block #2019528 couldn't deliver it to them. To address this case might happen in future, we wrote a script to help stuck nodes connect directly to TomoChain masternodes, which can accept up to 200 peers, in order to sync block and keep them moving forward. Link to the script could be found in the relevant github issue; I put it here as well https://gist.github.com/thanhson1085/77be18f8110de00805d55513aa70aed5
I've tried the script and added the 3 offical peers, but it still does not work. I can see even TomoChain peers being removed, like #df6a2423d01af3bf. My node version is 1.3 and stuck at #2615057.
########## BAD BLOCK #########
2019-03-09T06:53:00.015502944Z Chain config: {ChainID: 88 Homestead: 1 DAO:
2019-03-09T06:53:00.015563837Z WARN [03-09|06:53:00] Synchronisation failed, dropping peer peer=df6a2423d01af3bf err="retrieved hash chain is invalid"
2019-03-09T06:53:00.015632090Z INFO [03-09|06:53:00] Starting mining operation
2019-03-09T06:53:00.017002747Z INFO [03-09|06:53:00] Not my turn to commit block. Waiting...
2019-03-09T06:53:06.793206759Z INFO [03-09|06:53:06] Not my turn to commit block. Waiting...
2019-03-09T06:53:07.022556542Z INFO [03-09|06:53:07] Mining aborted due to sync
2019-03-09T06:53:07.368752334Z ERROR[03-09|06:53:07]
2019-03-09T06:53:07.368792583Z ########## BAD BLOCK #########
2019-03-09T06:53:07.368808135Z Chain config: {ChainID: 88 Homestead: 1 DAO:
2019-03-09T06:53:07.368885820Z WARN [03-09|06:53:07] Synchronisation failed, dropping peer peer=64bda1653bf6da87 err="retrieved hash chain is invalid"
2019-03-09T06:53:07.368895270Z INFO [03-09|06:53:07] Starting mining operation
2019-03-09T06:53:07.370010173Z INFO [03-09|06:53:07] Not my turn to commit block. Waiting...
2019-03-09T06:53:08.919148843Z INFO [03-09|06:53:08] Mining aborted due to sync
Can you show me the output this commands @kestop:
# get tomochain container id
container_id=$(docker ps -q -f "name=tomochain")
# add TomoChain peers
docker exec -t $container_id tomo attach data/tomo.ipc --exec "admin.peers"
Fix via #464 #462
Error: no validator in header ##############################
WARN [02-26|08:24:17] Synchronisation failed, dropping peer peer=64bda1653bf6 da87 err="retrieved hash chain is invalid" ERROR[02-26|08:24:20] ########## BAD BLOCK ######### Chain config: {ChainID: 88 Homestead: 1 DAO: DAOSupport: false EIP150: 2 E IP155: 3 EIP158: 3 Byzantium: 4 Constantinople: Engine: posv}