cosmos / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Apache License 2.0
6.17k stars 3.57k forks source link

panic: cannot delete latest saved version #14625

Closed k-yang closed 1 year ago

k-yang commented 1 year ago

Summary of Bug

We performed a chain upgrade. The chain upgrade added a new module. We registered the UpgradeHandler but forgot to add the module to the StoreUpgrades (see https://github.com/NibiruChain/nibiru/commit/b986c494d22288b6efaef60783cc338421dbee11).

When we started the new binary, it tried to Prune previous versions of the store, and we ended up with the error panic: cannot delete latest saved version (718). We knew something was wrong because we our upgrade height is 2460000. It means the store was loaded with an initialVersion less than the upgradeHeight.

Here is the full stack trace. ``` 4:39PM INF executed block height=2460000 module=consensus num_invalid_txs=0 num_valid_txs=0 panic: cannot delete latest saved version (718) goroutine 1 [running]: github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).PruneStores(0xc00116ed00, 0x1, {0x0?, 0x0?, 0xc010d7c700?}) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/store/rootmulti/store.go:460 +0x285 github.com/cosmos/cosmos-sdk/store/rootmulti.(*Store).Commit(0xc00116ed00) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/store/rootmulti/store.go:429 +0x1ab github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).Commit(0xc0002c1880) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/baseapp/abci.go:314 +0x1e9 github.com/tendermint/tendermint/abci/client.(*localClient).CommitSync(0xc00107db60) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/abci/client/local_client.go:264 +0xb6 github.com/tendermint/tendermint/proxy.(*appConnConsensus).CommitSync(0xc0028d46b0?) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/proxy/app_conn.go:93 +0x22 github.com/tendermint/tendermint/state.(*BlockExecutor).Commit(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258974, {{0xc006ca2000, ...}, ...}, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/state/execution.go:228 +0x269 github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258974, {{0xc006ca2000, ...}, ...}, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/state/execution.go:180 +0x6ee github.com/tendermint/tendermint/consensus.(*Handshaker).replayBlock(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, {{0xc005692da0, ...}, ...}, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:503 +0x23c github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(_, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, {{0xc005692da0, ...}, ...}, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:416 +0x7ae github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc0028d5d90, {0x2f95740, 0xc00055dd40}) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/consensus/replay.go:268 +0x3d4 github.com/tendermint/tendermint/node.doHandshake({_, _}, {{{0xb, 0x0}, {0xc001f08fb8, 0x8}}, {0xc001f08ff0, 0xf}, 0x1, 0x258973, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/node/node.go:330 +0x1b8 github.com/tendermint/tendermint/node.NewNode(0xc0010e6640, {0x2f7e780, 0xc00125bcc0}, 0xc001289000, {0x2f63120, 0xc010676648}, 0x0?, 0x0?, 0xc001289250, {0x2f83970, ...}, ...) /home/runner/go/pkg/mod/github.com/tendermint/tendermint@v0.34.22/node/node.go:778 +0x597 github.com/cosmos/cosmos-sdk/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x2fa1bb8, 0xc00117a3f0}, {0xc000303b30, 0xf}, {0x2f89120, 0xc0010bf810}, ...}, ...) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/start.go:280 +0x7db github.com/cosmos/cosmos-sdk/server.StartCmd.func2(0xc001115b00?, {0x41aaa68?, 0x0?, 0x0?}) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/start.go:128 +0x169 github.com/spf13/cobra.(*Command).execute(0xc001115b00, {0x41aaa68, 0x0, 0x0}) /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916 +0x862 github.com/spf13/cobra.(*Command).ExecuteC(0xc000f76600) /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3bd github.com/spf13/cobra.(*Command).Execute(...) /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968 github.com/spf13/cobra.(*Command).ExecuteContext(...) /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:961 github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x0?, {0xc0010c6258, 0x12}) /home/runner/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.45.10/server/cmd/execute.go:36 +0x1eb main.main() /home/runner/work/nibiru/nibiru/cmd/nibid/main.go:16 +0x2c ```

We read through https://github.com/cosmos/ibc-go/issues/1088 and https://github.com/cosmos/cosmos-sdk/issues/8265 and added the module to StoreUpgrades (see https://github.com/NibiruChain/nibiru/commit/0c1bd2ec558c0a9ddac7e93670e49d2385eceb46). We ran this binary and obtained the following error message:

failed to load latest version: failed to load store: initial version set to 2460000, but found earlier version 1

which indicates that the new store we're trying to add was already persisted disk by the earlier binary run.

We can continue our chain by setting pruning = "nothing" but this is undesirable because it effectively turns our node into a full archive node.

We are at an impasse because the earlier binary run persisted the store to disk with initialVersion=0, preventing us from pruning from height=2460000, and we are no longer able to add the module to StoreUpgrades.

In my opinion, the Cosmos SDK shouldn't persist stores to disk unless the StoreUpgrades array is explicitly given.

Version

We use v0.45.10 of the Cosmos SDK.

Steps to Reproduce

# old binary
nibid start

nibid tx gov submit-proposal software-upgrade "v0.17.0" \
--title "v0.17.0" \
--description "Upgrade to v0.17.0" \
--upgrade-height 2460000 \
--upgrade-info '' \
--from val \
--output json \
--yes \
--gas-prices 0.025unibi \
--deposit 10000000unibi

nibid tx gov vote 1 yes \
--from val \
--output json \
--yes \
--gas-prices 0.025unibi

# wait for chain halt and download new binary
nibid start
tac0turtle commented 1 year ago

this is usually done by an incorrect upgrade of adding a store where the store version is not the same as the others. We should really fix that

yihuang commented 1 year ago

I feel you, we probably shouldn't separate the stores into multiple trees in the first place.

k-yang commented 1 year ago

this is usually done by an incorrect upgrade of adding a store where the store version is not the same as the others. We should really fix that

I think if we follow the invariant that store versions are always equal to the block height, then adding a store could set the initialVersion = blockHeight automatically and we wouldn't need the following check in loadVersion.

if upgrades.IsAdded(key.Name()) {
  storeParams.initialVersion = uint64(ver) + 1
}

That could simplify the StoreUpgrades for added modules.

julienrbrt commented 1 year ago

related: https://github.com/cosmos/cosmos-sdk/issues/13933 / https://github.com/cosmos/cosmos-sdk/pull/14410

alexanderbez commented 1 year ago

I feel you, we probably shouldn't separate the stores into multiple trees in the first place.

Yes! I've been advocating for a single logical DB/tree for ages now. It also gives us atomicity. However, this is technically orthogonal to the issue described here. We will be discussing this in the storage WG though!


@tac0turtle we should combine #13933 and this issue into a single issue (I have no preference on which we close or even just create a new one).

As pointed out by @k-yang and others, I believe the simple solution here to have the root MS ensure that whenever a new module store is discovered, it is loaded/saved with the current block height and not 0.

I think (?) https://github.com/cosmos/cosmos-sdk/pull/14410 is going to do this but I'm not sure.

cc @catShaark

catShaark commented 1 year ago

@alexanderbez, so my PR will add validation in the loadVersion so that the version of every module stores matches version of root store. This validation will prevent chain from getting this uneven store heights problem, not actually solving it.