Endless loop - [Starting sync - Finished sync]

ghost commented 7 years ago

I've noticed in 0.5.0-0.5.1 that sometimes nodes enter into endless loop of resyncing. There is no information about any fork, just this. Restarting lisk does not help. In result node is stucked on some height behind network.

[inf] 2016-12-11 11:05:27 | Blocks 1534264
[inf] 2016-12-11 11:05:27 | Genesis block matched with database
[inf] 2016-12-11 11:05:37 | Blockchain ready
[inf] 2016-12-11 11:05:38 | Peers ready, stored 460
[inf] 2016-12-11 11:05:39 | Starting sync
[inf] 2016-12-11 11:05:39 | Broadhash consensus now 15 %
[WRN] 2016-12-11 11:05:40 | EUNAVAILABLE Removing peer 138.201.19.234:8000 GET http://138.201.19.234:8000/peer/height
[ERR] 2016-12-11 11:05:40 | Failed to get height from peer: 138.201.19.234:8000
[WRN] 2016-12-11 11:05:40 | ERESPONSE 404 Removing peer 46.228.6.34:8000 GET http://46.228.6.34:8000/peer/height
[ERR] 2016-12-11 11:05:40 | Failed to get height from peer: 46.228.6.34:8000
[WRN] 2016-12-11 11:05:44 | ETIMEOUT Ban 212.24.102.163:8000 10 minutes GET http://212.24.102.163:8000/peer/height
[WRN] 2016-12-11 11:05:44 | ETIMEOUT Ban 138.197.135.149:8000 10 minutes GET http://138.197.135.149:8000/peer/height
[ERR] 2016-12-11 11:05:44 | Failed to get height from peer: 212.24.102.163:8000
[ERR] 2016-12-11 11:05:44 | Failed to get height from peer: 138.197.135.149:8000
[inf] 2016-12-11 11:05:44 | Looking for common block with: 63.142.255.85:8000
[inf] 2016-12-11 11:05:55 | Found common block: 2305067222973732121 with: 63.142.255.85:8000
[inf] 2016-12-11 11:05:55 | Loading blocks from: 63.142.255.85:8000
[inf] 2016-12-11 11:05:55 | Block 7009774920556977921 loaded from: 63.142.255.85:8000 - height: 1534265
[inf] 2016-12-11 11:05:55 | Looking for common block with: 45.76.32.209:8000
[WRN] 2016-12-11 11:06:02 | EVERSION:0.3.0 Removing peer 119.28.51.182:8000 GET /signatures
[inf] 2016-12-11 11:06:05 | Found common block: 7009774920556977921 with: 45.76.32.209:8000
[inf] 2016-12-11 11:06:05 | Loading blocks from: 45.76.32.209:8000
[inf] 2016-12-11 11:06:05 | Broadhash consensus now 8 %
[inf] 2016-12-11 11:06:05 | Finished sync
[WRN] 2016-12-11 11:06:11 | EVERSION:0.3.0 Removing peer 119.28.51.182:8000 GET /height
[inf] 2016-12-11 11:06:15 | Starting sync
[inf] 2016-12-11 11:06:16 | Broadhash consensus now 11 %
[inf] 2016-12-11 11:06:16 | Looking for common block with: 83.136.249.90:8000
[inf] 2016-12-11 11:06:25 | Found common block: 7009774920556977921 with: 83.136.249.90:8000
[inf] 2016-12-11 11:06:25 | Loading blocks from: 83.136.249.90:8000
[inf] 2016-12-11 11:06:25 | Broadhash consensus now 12 %
[inf] 2016-12-11 11:06:25 | Finished sync
[WRN] 2016-12-11 11:06:33 | EVERSION:0.3.0 Removing peer 63.82.5.67:8000 GET /signatures
[WRN] 2016-12-11 11:06:33 | EVERSION:0.3.0 Removing peer 63.82.5.67:8000 GET /signatures
[inf] 2016-12-11 11:06:35 | Starting sync
[inf] 2016-12-11 11:06:35 | Broadhash consensus now 13 %
[inf] 2016-12-11 11:06:35 | Looking for common block with: 193.70.114.232:8000
[inf] 2016-12-11 11:06:43 | Found common block: 7009774920556977921 with: 193.70.114.232:8000
[inf] 2016-12-11 11:06:43 | Loading blocks from: 193.70.114.232:8000
[inf] 2016-12-11 11:06:43 | Broadhash consensus now 12 %
[inf] 2016-12-11 11:06:43 | Finished sync
[inf] 2016-12-11 11:06:54 | Starting sync
[inf] 2016-12-11 11:06:54 | Broadhash consensus now 13 %
[inf] 2016-12-11 11:06:54 | Looking for common block with: 104.238.152.253:8000
[WRN] 2016-12-11 11:06:55 | EVERSION:0.3.0 Removing peer 63.82.5.67:8000 GET /height
[WRN] 2016-12-11 11:07:06 | EVERSION:0.3.1 Removing peer 37.97.194.133:8000 GET /height
[inf] 2016-12-11 11:07:07 | Found common block: 7009774920556977921 with: 104.238.152.253:8000
[inf] 2016-12-11 11:07:07 | Loading blocks from: 104.238.152.253:8000
[inf] 2016-12-11 11:07:07 | Broadhash consensus now 13 %
[inf] 2016-12-11 11:07:07 | Finished sync

4miners commented 7 years ago

Blocks from your log (2305067222973732121, 7009774920556977921) are not present in network - so your node was in forked chain. It cannot rollback and sync with proper height because of https://github.com/LiskHQ/lisk/issues/334 issue. When referenced issue will be resolved - such "endless loop" will not occur because node will be able to sync with rest of the network. Also that "endless loop" is not endless loop at all but proper behavior in such case - node know that is behind or not receiving new blocks - so sync should keep trigger until node state is resolved.

ghost commented 7 years ago

In fact - it is endless loop because of #334. It's proper behaviour has been affected by other issue as you said. Anyway, hope to get resolved it soon!

Crypto2 commented 7 years ago

I'm having this issue as well it seems with the latest docker image. Have restarted container and even deleted and create new container multiple times and always results in the same:

[inf] 2017-05-07 21:17:31 | Looking for common block with: 45.32.159.112:8000 [inf] 2017-05-07 21:17:31 | Found common block: 10688395562067015926 with: 45.32.159.112:8000 [inf] 2017-05-07 21:17:31 | Loading blocks from: 45.32.159.112:8000 [ERR] 2017-05-07 21:17:31 | Expected generator: fbac76743fad9448ed0b45fb4c97a62f81a358908aa14f6a2c76d2a8dc207141 Received generator: 0c0c8f58e7feeaa687d7dc9a5146ea14afe1bc647f518990b197b9f55728effa [inf] 2017-05-07 21:17:31 | Fork - {"delegate":"0c0c8f58e7feeaa687d7dc9a5146ea14afe1bc647f518990b197b9f55728effa","block":{"id":"1 3975364413689503547","timestamp":29600520,"height":2738321,"previousBlock":"10688395562067015926"},"cause":3} [ERR] 2017-05-07 21:17:31 | Error loading blocks: Failed to verify slot: 2960052 [ERR] 2017-05-07 21:17:31 | Failed to load blocks from: 45.32.159.112:8000 [inf] 2017-05-07 21:17:31 | Broadhash consensus now 0 % [inf] 2017-05-07 21:17:31 | Finished sync

4miners commented 7 years ago

@Crypto2 Your issue is different, block 13975364413689503547 is valid, so your node should accept it. It fail probably because your system clock is not synced well. Please use ntp for keeping it in sync - it should resolve your issue.

Crypto2 commented 7 years ago

System time is synced to pool.ntp.org daily. I even ran it manually to verify time is correct: "ntpdate[31777]: adjust time server 144.217.242.53 offset 0.000451 sec"

Edit: Timezone in docker image is set to the same as the host server if that matters but had the same problem when it was the default UTC as well.

4miners commented 7 years ago

@Crypto2 Thank you, we will investigate.

Crypto2 commented 7 years ago

Any update on this?

LiskArchive / lisk-sdk

Endless loop - [Starting sync - Finished sync] #350