BuildOnViction / tomochain-v1

The efficient blockchain for the token economy
https://tomochain.com
GNU Lesser General Public License v3.0
51 stars 22 forks source link

Error: no validator in header #453

Closed ngtuna closed 5 years ago

ngtuna commented 5 years ago

Error: no validator in header ##############################

WARN [02-26|08:24:17] Synchronisation failed, dropping peer peer=64bda1653bf6 da87 err="retrieved hash chain is invalid" ERROR[02-26|08:24:20] ########## BAD BLOCK ######### Chain config: {ChainID: 88 Homestead: 1 DAO: DAOSupport: false EIP150: 2 E IP155: 3 EIP158: 3 Byzantium: 4 Constantinople: Engine: posv}

thanhson1085 commented 5 years ago

Update more info:

Error: no validator in header
##############################

WARN [03-01|02:08:52] Synchronisation failed, dropping peer    peer=347fbb1919dfed32 err="retrieved hash chain is invalid"
ERROR[03-01|02:09:01] 
########## BAD BLOCK #########
Chain config: {ChainID: 88 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: <nil> Engine: posv}

Number: 2019528
Hash: 0x62eef880af5c4e005e2efb60e5ade953e52a60e1ce026e3aaafd19a56a439eff
djSiMuL commented 5 years ago

I cannot find the logs that far back on my node, however, they were identical to what @thanhson1085 posted. Same block: 2019528.

When the node would reach this block during syncing, it would immediately drop all peers, then the node would reconnect to 1, then 2, then 3 peers, then it would drop back to 2, then back to 1, sometimes even 0. It would fluctuated between 0 and 3 peers, continually going up and back down in peer count. And it would never go past block 2019528.

This happened three times on one of my nodes. The only way that I was able to get beyond block 2019528 was to use a chaindata archive that was already past that block.

To test this, I recommend that you bring up a node and let it start syncing from 0. When it gets to block 2019528, it will probably stop. That would give you something to troubleshoot.

arthurk commented 5 years ago

I've had the same issue with block 2019528. I started syncing from a snapshot in January and everything worked fine until I hit that block. The client would just drop all peers (stats page showed only 1 peer) and stop working. It keeps trying to download the block and fails every time, getting stuck in a loop. The only choice I have is to download a sync past that block.

Error: no validator in header  
############################## 

WARN [03-01|02:54:00] Synchronisation failed, dropping peer    peer=64bda1653bf6da87 err="retrieved hash chain is invalid"
ERROR[03-01|02:54:18]
########## BAD BLOCK ######### 
Chain config: {ChainID: 88 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: <nil> Engine: posv}

Number: 2019528
Hash: 0x62eef880af5c4e005e2efb60e5ade953e52a60e1ce026e3aaafd19a56a439eff
ngtuna commented 5 years ago

Thanks @arthurk @djSiMuL . The peer info is important to us. Leave us more time to investigate the root cause. I believe you guys have passed it thanks to the snapshot.

thanhnguyennguyen commented 5 years ago

@arthurk , @djSiMuL Could you please share your node information: Tomo version, Go version ?

arthurk commented 5 years ago

@thanhnguyennguyen tomo/v1.2.2-stable/linux-amd64/go1.10.8

thanhson1085 commented 5 years ago

the nodes opened port 30303? @arthurk @djSiMuL

thanhson1085 commented 5 years ago

Try to run the script to add right peers:

#!/bin/sh

# get tomochain container id
container_id=$(docker ps -q -f "name=tomochain")

# remove all peers
echo "\n------------------------\n!! Removing all peers\n------------------------\n"
docker exec -t  $container_id tomo attach data/tomo.ipc --exec "for (i = 0; i < admin.peers.length; i++) { admin.removePeer(admin.peers[i].id) }"

# add TomoChain peers
docker exec -t  $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://df6a2423d01af3bf706a54417747553d02532d982eb5c612ef92d9e70e6b9eafe21afcfc27c7cd477e4fe2dcc0b46b73c977010811ede1c1dfca0944b8f310f6@35.202.169.170:30303')"
docker exec -t  $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://c77e3b75a831b08eeac4ccdcc8b9085e357794b97e27eeee840c787b39d22b89d394e6796c6ff701cb14625a4c82c14e31f2162d2c6b9b000b6fd91c6f2b89dd@188.166.207.189:30303')"
docker exec -t  $container_id tomo attach data/tomo.ipc --exec "admin.addPeer('enode://64bda1653bf6da87984c2b2bebcaa6811097a1ff87c15b3b421108ce82ed3dd8ed9e0be7c388a44c131b6b4934d2413c301f214e794d2d12b724d158878ba65f@104.248.98.59:30303')"

# restart node
echo "\n------------------------\n!! Restarting node\n------------------------\n"
tmn update

https://gist.github.com/thanhson1085/77be18f8110de00805d55513aa70aed5

djSiMuL commented 5 years ago

tomo/v1.2.2-stable/linux-amd64/go1.10.8

Yes, port 30303 is open.

Since I was able to get passed this block on my good nodes, I will not bother running the good peers script. If I ever resync a new server and get stuck on that block again, I'll give this a try.

arthurk commented 5 years ago

Yes the port is open but I've already downloaded a newer snapshot which included the bad block and started syncing from there, so I cannot test the script.

ngtuna commented 5 years ago

I put the final answer on telegram. It's good to close this issue now.

@djSiMuL @arthurkoziel and others getting the BAD BLOCK - Error: no validator in header issue, here is our answer after few days investing the issue: we figured out that due to the stability of the whole network all nodes reached to maximum capacity of 25 peers connected, thus some nodes coming late have very few connections. @djSiMuL and @arthurkoziel nodes had less than 3 peers at that time. The bad block was coming from those peers was rejected and those peers were disconnected; and the loop of connecting peer, rejecting bad block and disconnecting peer made them stuck. Other peers which have good block #2019528 couldn't deliver it to them. To address this case might happen in future, we wrote a script to help stuck nodes connect directly to TomoChain masternodes, which can accept up to 200 peers, in order to sync block and keep them moving forward. Link to the script could be found in the relevant github issue; I put it here as well https://gist.github.com/thanhson1085/77be18f8110de00805d55513aa70aed5

kestop commented 5 years ago

I've tried the script and added the 3 offical peers, but it still does not work. I can see even TomoChain peers being removed, like #df6a2423d01af3bf. My node version is 1.3 and stuck at #2615057.

########## BAD BLOCK ######### 2019-03-09T06:53:00.015502944Z Chain config: {ChainID: 88 Homestead: 1 DAO: DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: Engine: posv} 2019-03-09T06:53:00.015511727Z 2019-03-09T06:53:00.015518083Z Number: 2615057 2019-03-09T06:53:00.015524603Z Hash: 0xca599f42d50798601397797781ec47fcdd1451c0010ce9b103c2f767f7675efc 2019-03-09T06:53:00.015531047Z 2019-03-09T06:53:00.015536837Z 2019-03-09T06:53:00.015542524Z Error: no validator in header 2019-03-09T06:53:00.015548694Z ############################## 2019-03-09T06:53:00.015554759Z
2019-03-09T06:53:00.015563837Z WARN [03-09|06:53:00] Synchronisation failed, dropping peer peer=df6a2423d01af3bf err="retrieved hash chain is invalid" 2019-03-09T06:53:00.015632090Z INFO [03-09|06:53:00] Starting mining operation 2019-03-09T06:53:00.017002747Z INFO [03-09|06:53:00] Not my turn to commit block. Waiting... 2019-03-09T06:53:06.793206759Z INFO [03-09|06:53:06] Not my turn to commit block. Waiting... 2019-03-09T06:53:07.022556542Z INFO [03-09|06:53:07] Mining aborted due to sync 2019-03-09T06:53:07.368752334Z ERROR[03-09|06:53:07] 2019-03-09T06:53:07.368792583Z ########## BAD BLOCK ######### 2019-03-09T06:53:07.368808135Z Chain config: {ChainID: 88 Homestead: 1 DAO: DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: Engine: posv} 2019-03-09T06:53:07.368817025Z 2019-03-09T06:53:07.368823889Z Number: 2615057 2019-03-09T06:53:07.368830764Z Hash: 0xca599f42d50798601397797781ec47fcdd1451c0010ce9b103c2f767f7675efc 2019-03-09T06:53:07.368837929Z 2019-03-09T06:53:07.368844661Z 2019-03-09T06:53:07.368851497Z Error: no validator in header 2019-03-09T06:53:07.368858329Z ############################## 2019-03-09T06:53:07.368879342Z
2019-03-09T06:53:07.368885820Z WARN [03-09|06:53:07] Synchronisation failed, dropping peer peer=64bda1653bf6da87 err="retrieved hash chain is invalid" 2019-03-09T06:53:07.368895270Z INFO [03-09|06:53:07] Starting mining operation 2019-03-09T06:53:07.370010173Z INFO [03-09|06:53:07] Not my turn to commit block. Waiting... 2019-03-09T06:53:08.919148843Z INFO [03-09|06:53:08] Mining aborted due to sync

thanhson1085 commented 5 years ago

Can you show me the output this commands @kestop:

# get tomochain container id
container_id=$(docker ps -q -f "name=tomochain")

# add TomoChain peers
docker exec -t  $container_id tomo attach data/tomo.ipc --exec "admin.peers"
ngtuna commented 5 years ago

Fix via #464 #462