cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.3k stars 3.64k forks source link

x/upgrade: no checks the state and app height matching with the block height #11952

Open JayT106 opened 2 years ago

JayT106 commented 2 years ago

Summary of Bug

We observed an issue when the cosmovisor ran the previous binary (SDK v0.44.3) and has an error happening during the plan executing height H, in our case, we had a file permission issue (it's an operating issue, not the SDK) so the block at H was not able to commit completely, the app/consensus state will become H - 1. And we see the error like:

CONSENSUS FAILURE!!!  due to unable to write upgrade info to filesystem: open /chain/.cronosd/data/upgrade-info.json: permission denied

Checked the wal log, the it already stores the end of block height at H

Later on, the cosmovisor will tries to use the new binary (SDK v0.45.4) to replay with the block Hafter restarting the node. and the store complains it cannot load the version H

Version

v0.44.3 and v0.45.4

Solution

The upgrade module might need to check the app/state height match with the block height, replay the block H with the original release binary and then proceed with the upgrade plan.

Another workaround solution will be to let the node can rollback the pending block H and restart the node with the original binary, but it looks like not proceed able with the current Tendermint rollback implementation.


For Admin Use

alexanderbez commented 2 years ago

Shouldn't H be committed with the old binary?

JayT106 commented 2 years ago

Shouldn't H be committed with the old binary?

Yes, it was committed. but looks like failed because of the error I posted(but this is our operation issue). So the app/consensus state was not be updated.