cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.26k stars 3.62k forks source link

statesync: app version is not correctly set #10791

Closed cmwaters closed 1 month ago

cmwaters commented 2 years ago

Summary of Bug

Tendermint, at the end of statesync, will call the Info ABCI call as a means of verifying that the app hash it has matches to the one the application has (thus verifying that the correct state was synced).

https://github.com/tendermint/tendermint/blob/20c547a901eb003912bd1b34ff45d34ac118b5f8/internal/statesync/syncer.go#L352-L356

In doing so the application also returns its app version which Tendermint uses to set its own internal state (which it doesn't really need to). Currently it seems this is not being set correctly and the application is always returning a 0 app version. In the case, where the actual version is not 0 this will cause the node to halt with the error message:

wrong Block.Header.Version. Expected {11 0}, got {11 1}

This means that any application with a non zero app version can't state sync atm. One example I experienced was using osmosis. I am not sure of others.

Version

This is on v0.44.3 but I assume that it also affects earlier versions as well.

Steps to Reproduce

Try state sync a node on Osmosis or any application that has a non-nil app version. Cosmos Hub has 0 as its app version.

Work arounds

The node that has just state synced will have everything correct except the app version. As a workaround, all that is needed is to modify the app version within the State struct that Tendermint keeps. I have written up a command on a branch that can perform this operation for node operators that are stuck:

git clone https://github.com/tendermint/tendermint
cd tendermint
git checkout callum/app-version
make install
tendermint set-app-version 1 --home ~/.osmosisd

NOTE: This is not an official solution, it is a custom patch for the v0.34 line. Make sure you set the correct --home flag.


For Admin Use

tac0turtle commented 2 years ago

Looking at state sync we should be calling abci.info afterwards to update any versions but doesn't seem we do.

czarcas7ic commented 2 years ago

Is there an app version mismatch fix for nodes using a different db backend such as rocksdb?

tac0turtle commented 2 years ago

this isn't a db issue, but a software issue.

tac0turtle commented 2 years ago

I haven't had a chance to dive in-depth, but there is a chance that it may need to be.

anilcse commented 2 years ago

I haven't had a chance to dive in-depth, but there is a chance that it may need to be.

Oops, my bad, I deleted the comment mistakenly.

For reference it was:

@marbar3778 do we need to fix this on the tendermint side? Can you confirm if you know the details?

jhernandezb commented 2 years ago

We are seeing the same issue now after the first upgrade (v0.44+)

liangping commented 2 years ago
1:54PM INF Applied snapshot chunk to ABCI app chunk=22 format=1 height=2206500 module=statesync total=23
1:54PM INF Verified ABCI app appHash="�`\x12Lg�\x04o�@b\x1eHRQ\\g\\�`s��J�j\r��f�x" height=2206500 module=statesync
1:54PM INF Snapshot restored format=1 hash=";ܺ1��\x1cr��v\x1b����5WiI+賑�\x18\x1eѸ;K)" height=2206500 module=statesync
1:54PM INF Starting BlockPool service impl=BlockPool module=blockchain
panic: Failed to process committed block (2206501:3EE209C7760C161AD2087837810564EB4E742D468467800A22F8A0CC0BA7EC74): wrong Block.Header.Version. Expected {11 0}, got {11 1}

goroutine 1289 [running]:
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000c21180, 0x1)
    github.com/tendermint/tendermint@v0.34.15/blockchain/v0/reactor.go:401 +0x123a
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).SwitchToFastSync
    github.com/tendermint/tendermint@v0.34.15/blockchain/v0/reactor.go:125 +0xe5
ping@stargaze2:~$ ~/go/bin/tendermint set-app-version 1 --home ~/.starsd
Set app version
ping@stargaze2:~$ starsd start
1:54PM INF starting ABCI with Tendermint
failed initialize pinned codes Error calling the VM: Cache error: Error opening Wasm file for reading: No such file or directory (os error 2): pinning contract failed

It did not work.

glebiller commented 2 years ago

We are seeing a similar error using Tendermint v0.34.21. This is coming from verifyApp method (https://github.com/tendermint/tendermint/blob/v0.34.21/statesync/syncer.go#L486-L491).

It calls the ABCI InfoSync method, but the AppVersion in the response is 0 and trigger the error while trying to restore the snapshot.

tac0turtle commented 1 month ago

this has been fixed now