LiskArchive / lisk-sdk

🔩 Lisk software development kit
https://lisk.com
Apache License 2.0
2.72k stars 456 forks source link

Delegate got POMed without any action #6982

Closed dakk closed 2 years ago

dakk commented 2 years ago

Today my delegate dakk got POMed without any action, I can't figure what happened. I report the full timeline of events:

  1. The server running lisk-core lost network connectivity for ~40 minutes; lisk-core remained enabled with forging enabled
  2. In the meanwhile the delegate missed two blocks
  3. Server come back online; lisk-core was still running with forging enabled, no reboot / restart happened
  4. The node tried to forge a new block; a node reported misbehavior https://liskscan.com/transaction/fc8444770e7ae9f532ac00ba9a11e93238538aba8f1acab34306505b4dc2e984

Unfortunately, I don't have any logs since lisk-core was running without logging enabled. The only facts I'm 100% sure are:

przemerr commented 2 years ago

i did some digging, i log every block that hits my node

 2021-12-31T07:40:50 "height":17392820,"maxHeightPreviouslyForged":17392758,"maxHeightPrevoted":17392736 corsaro
 2021-12-31T07:41:00 "height":17392821,"maxHeightPreviouslyForged":17392771,"maxHeightPrevoted":17392736 splatters
-2021-12-31T07:41:10 "height":17392822,"maxHeightPreviouslyForged":17392691,"maxHeightPrevoted":17392736 dakk
 2021-12-31T07:41:20 "height":17392823,"maxHeightPreviouslyForged":17392701,"maxHeightPrevoted":17392738 irina18
 2021-12-31T07:41:30 "height":17392824,"maxHeightPreviouslyForged":17392781,"maxHeightPrevoted":17392740 raoul
 --
 2021-12-31T07:58:10 "height":17392924,"maxHeightPreviouslyForged":17392848,"maxHeightPrevoted":17392834 helpinghand
 2021-12-31T07:58:20 "height":17392925,"maxHeightPreviouslyForged":17392839,"maxHeightPrevoted":17392834 savetheworld
-2021-12-31T07:58:24 "height":17392869,"maxHeightPreviouslyForged":17392869,"maxHeightPrevoted":17392799 dakk
 2021-12-31T07:58:30 "height":17392926,"maxHeightPreviouslyForged":17392880,"maxHeightPrevoted":17392834 liskroad
 2021-12-31T07:58:40 "height":17392927,"maxHeightPreviouslyForged":17392860,"maxHeightPrevoted":17392834 liskearn
 --
 2021-12-31T08:43:50 "height":17393196,"maxHeightPreviouslyForged":17393042,"maxHeightPrevoted":17393128 hirish
 2021-12-31T08:44:00 "height":17393197,"maxHeightPreviouslyForged":17393029,"maxHeightPrevoted":17393129 endro
-2021-12-31T08:44:10 "height":17393198,"maxHeightPreviouslyForged":17392822,"maxHeightPrevoted":17393130 dakk
 2021-12-31T08:44:20 "height":17393199,"maxHeightPreviouslyForged":17393148,"maxHeightPrevoted":17393131 grumlin
 2021-12-31T08:44:30 "height":17393200,"maxHeightPreviouslyForged":17393192,"maxHeightPrevoted":17393131 oscarandjames
 --

the times are in GMT, the block that is in the middle wasn't registered on blockchain (invalidated due to height, probably a forked block) the third block has "maxHeightPreviouslyForged":17392822," which is lower then the forked block height (reason of poming) it corresponds to the last block that was registered on the blockchain. So if you really didn't do anything it looks like the forked block data wasn't saved in the forging info db.

dakk commented 2 years ago

@przemerr So it may be a file corruction problem or something like that?

przemerr commented 2 years ago

I'm really not sure, you forged 3 blocks after the PoM, and there was no issue with forging info.

dakk commented 2 years ago

after the pom it continued to forge using wrong values

ishantiw commented 2 years ago

@dakk We tried to reproduce the problem by creating the same conditions as described in the issue, I tested it 3 times with misbehaviour plugin enabled and with several forging nodes simulating mainnet. Our findings,

Offline,

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2288,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2287}

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2293,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2292}

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2294,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2293}

Online again,

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2781,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2294

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2841,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2781

{"address":"0ada6a2f6c8f891769366fc9aa6fd9f1facb36cf","forging":true,"height":2939,"maxHeightPrevoted":1395,"maxHeightPreviouslyForged":2841}

It seems like it doesn’t affect forger’s info when the node comes back online and doesn’t get PoMed. Since the logs are missing from your end, its really hard for us to know what really happened on your node when it was offline. We haven't received any other issues similar to yours and we also tried to reproduce it internally in a network setting as described by you so I assume something malfunctioned on your node for some reason unrelated to Lisk core. In case if we receive any similar problem again we will reopen this issue.

dakk commented 2 years ago

Yep, after further investigation I detected a disk failure with consequent data corruption. I'm pretty sure that was the true problem that generated the Pom