crypto-org-chain / cronos

Cronos is the first Ethereum-compatible blockchain network built on Cosmos SDK technology. Cronos aims to massively scale the DeFi, GameFi, and overall Web3 user community by providing builders with the ability to instantly port apps and crypto assets from other chains while benefiting from low transaction fees, high throughput, and fast finality.
Other
296 stars 238 forks source link

Panic when restarting node #1672

Open alienc0der opened 3 weeks ago

alienc0der commented 3 weeks ago

Describe the bug

The node cannot restart due to this error: "panic: Value missing for key [...] corresponding to nodeKey ..." when the pruning settings are either pruning = default or pruning = everything. If you set pruning = nothing, the node succesfully restarts without any errors. Tested with both goleveldb and rocksdb db backends.

This is greatly impacting the operation of full nodes as they require massive resources (in particular disk space).

Version Build based on this commit

To Reproduce Steps to reproduce the behavior:

  1. Stop node and wait for graceful termination
  2. Start node
  3. Check the logs/console for this error: panic: Value missing for key ...

Expected behavior The node should restart without errors.

Desktop (please complete the following information):

Additional context I see that the developers removed the pruning for IAVL in this pull request from 2020. However, they provided a fix in this pull request that is present in their most recent release v1.3.0.

I also see that you have an indirect dependency to iavl v1.2.0.

Stacktrace:

...
INF Application gracefully shutdown module=server
panic: Value missing for key [...] corresponding to nodeKey ...

goroutine 1 [running]:
cosmossdk.io/store/iavl.(*Store).Get(0x?, {0x?, 0x?, 0x1?})
    cosmossdk.io/store/iavl/store.go:195 +0x
...

@yihuang @mmsqe feel free to ask any questions regarding this issue.

yihuang commented 3 weeks ago

thanks for reminding, we should update iavl dependency then.

yihuang commented 3 weeks ago

I did a few restart in local devnet, not easy to reproduce though.

alienc0der commented 3 weeks ago

The issue is still persisting.

I did a few restart in local devnet, not easy to reproduce though.

It's simple to reproduce. Set pruning = everything or pruning = default in config.json, sync up a full node from scratch and stop it after it finishes to sync. After that try to start the node again. It will throw a panic error at this line cosmossdk.io/store/iavl/store.go:195 for this function cosmossdk.io/store/iavl.(*Store).Get(...).

yihuang commented 3 weeks ago

sync up a full node from scratch

Do you reproduce it in your local devnet?

mmsqe commented 3 weeks ago

Hi @alienc0der, did you reproduce by syncing from testnet data with main branch binary?

alienc0der commented 3 weeks ago

sync up a full node from scratch

Do you reproduce it in your local devnet?

Yes. Multiple times.

Hi @alienc0der, did you reproduce by syncing from testnet data with main branch binary?

I've tried to sync the cronos mainnet with no success. I've downloaded over 250Gb of data and tried to start it multiple times with no luck. I've tried the latest binary release and older ones too. I've given up, sorry.

I've used the latest cronos-1.4.0-rc3 release and this is the error that I get trying to start using home pointing to the cronosmainnet_25-1-rocksdb-pruned directory:

failed to load latest version: version of store icahost mismatch root store's version; expected 16608700 got 0; new stores should be added using StoreUpgrades

If you can provide me some clear instructions on how to sync the testnet or where to find the necessary info in one place (binary, genesis.json, pruned data, etc) it would be very helpful not only for me, but for anyone who wants to contribute in the future as well.

alienc0der commented 2 weeks ago

@yihuang @mmsqe here is the full stack trace of the panic:

panic: Value missing for key [0 0 0 0 0 0 0 1 0 0 0 1] corresponding to nodeKey 73000000000000000100000001

goroutine 1 [running]:
cosmossdk.io/store/iavl.(*Store).Get(0x140021701c0?, {0x10bf7e3f7?, 0x140021a2b20?, 0x109bb1140?})
    cosmossdk.io/store/iavl/store.go:195 +0xbc
cosmossdk.io/store/cache.(*CommitKVStoreCache).Get(0x140015394b8, {0x10bf7e3f7, 0x1, 0x1})
    cosmossdk.io/store/cache/cache.go:111 +0xc4
cosmossdk.io/store/cachekv.(*GStore[...]).Get(0x1058210cc, {0x10bf7e3f7?, 0x1, 0x1})
    cosmossdk.io/store/cachekv/store.go:79 +0x84
cosmossdk.io/store/gaskv.(*GStore[...]).Get(0x7d0, {0x10bf7e3f7?, 0x1, 0x1})
    cosmossdk.io/store/gaskv/store.go:65 +0x6c
github.com/evmos/ethermint/x/evm/keeper.Keeper.GetParams({{0x109e19138, 0x140015f3790}, {0x109da26a8, 0x140014f5870}, {0x109da2720, 0x140014f5900}, {0x14001079720, 0x14, 0x20}, {0x109deeb50, ...}, ...}, ...)
    github.com/evmos/ethermint/x/evm/keeper/params.go:26 +0x9c
github.com/evmos/ethermint/x/evm/keeper.(*Keeper).EVMBlockConfig(_, {{0x109dd0468, 0x10c287268}, {0x109dee020, 0x140021af260}, {{0x0, 0x0}, {0x1400242da70, 0x11}, 0xbc1a7, ...}, ...}, ...)
    github.com/evmos/ethermint/x/evm/keeper/config.go:70 +0x124
github.com/evmos/ethermint/x/evm/keeper.(*Keeper).BeginBlock(_, {{0x109dd0468, 0x10c287268}, {0x109dee020, 0x140021af260}, {{0x0, 0x0}, {0x1400242da70, 0x11}, 0xbc1a7, ...}, ...})
    github.com/evmos/ethermint/x/evm/keeper/abci.go:28 +0xa8
github.com/evmos/ethermint/x/evm.AppModule.BeginBlock({{}, 0x14002570360, {0x109deeb50, 0x140000f88c0}, {0x109d8f520, 0x1400165f260}}, {0x109dd04a0?, 0x1400200bc08?})
    github.com/evmos/ethermint/x/evm/module.go:164 +0x98
github.com/cosmos/cosmos-sdk/types/module.(*Manager).BeginBlock(_, {{0x109dd0468, 0x10c287268}, {0x109dee020, 0x140021af260}, {{0x0, 0x0}, {0x1400242da70, 0x11}, 0xbc1a7, ...}, ...})
    github.com/cosmos/cosmos-sdk/types/module/module.go:778 +0x14c
github.com/crypto-org-chain/cronos/v2/app.(*App).BeginBlocker(_, {{0x109dd0468, 0x10c287268}, {0x109dee020, 0x140021af260}, {{0x0, 0x0}, {0x1400242da70, 0x11}, 0xbc1a7, ...}, ...})
    github.com/crypto-org-chain/cronos/v2/app/app.go:1171 +0x80
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).beginBlock(0x14002562908, 0x14000af3c80?)
    github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:754 +0xa4
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).internalFinalizeBlock(0x14002562908, {0x109dd0468, 0x10c287268}, 0x14000af3c80)
    github.com/cosmos/cosmos-sdk/baseapp/abci.go:771 +0x988
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x14002562908, 0x14000af3c80)
    github.com/cosmos/cosmos-sdk/baseapp/abci.go:934 +0x10c
github.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock(...)
    github.com/cosmos/cosmos-sdk/server/cmt_abci.go:44
github.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x137804af8?, {0x109dd0858?, 0x10c287268?}, 0x10faf0108?)
    github.com/cometbft/cometbft/abci/client/local_client.go:189 +0xe4
github.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x1400125f998, {0x109dd0858, 0x10c287268}, 0x14000af3c80)
    github.com/cometbft/cometbft/proxy/app_conn.go:104 +0x124
github.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x14002182fc0, 0x7}}, {0x14000fc9998, 0x11}, 0x1, 0xbc1a6, {{0x14001524b60, ...}, ...}, ...}, ...)
    github.com/cometbft/cometbft/state/execution.go:236 +0x410
github.com/cometbft/cometbft/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0x14002182fc0, 0x7}}, {0x14000fc9998, 0x11}, 0x1, 0xbc1a6, {{0x14001524b60, ...}, ...}, ...}, ...)
    github.com/cometbft/cometbft/state/execution.go:231 +0x140
github.com/cometbft/cometbft/consensus.(*Handshaker).replayBlock(_, {{{0xb, 0x0}, {0x14002182fc0, 0x7}}, {0x14000fc9998, 0x11}, 0x1, 0xbc1a6, {{0x14001524b60, ...}, ...}, ...}, ...)
    github.com/cometbft/cometbft/consensus/replay.go:534 +0x1a8
github.com/cometbft/cometbft/consensus.(*Handshaker).ReplayBlocksWithContext(_, {_, _}, {{{0xb, 0x0}, {0x14002182fc0, 0x7}}, {0x14000fc9998, 0x11}, 0x1, ...}, ...)
    github.com/cometbft/cometbft/consensus/replay.go:433 +0x5d4
github.com/cometbft/cometbft/consensus.(*Handshaker).HandshakeWithContext(0x140021d56c0, {0x109dd07b0, 0x14001695860}, {0x109df7a98, 0x14001e4f960})
    github.com/cometbft/cometbft/consensus/replay.go:274 +0x370
github.com/cometbft/cometbft/node.doHandshake({_, _}, {_, _}, {{{0xb, 0x0}, {0x14002182fc0, 0x7}}, {0x14000fc9998, 0x11}, ...}, ...)
    github.com/cometbft/cometbft/node/setup.go:186 +0x12c
github.com/cometbft/cometbft/node.NewNodeWithContext({0x109dd07b0, 0x14001695860}, 0x14001f96640, {0x109dade70, 0x14001f8d5e0}, 0x1400217a0c0, {0x109dd0900, 0x1400125f158}, 0x14000b4ef70, 0x109d71c60, ...)
    github.com/cometbft/cometbft/node/node.go:360 +0x498
github.com/evmos/ethermint/server.startInProcess(_, {{0x0, 0x0, 0x0}, {0x109dfbc38, 0x1400212d470}, 0x0, {0x0, 0x0}, {0x109e19138, ...}, ...}, ...)
    github.com/evmos/ethermint/server/start.go:371 +0x13b4
github.com/evmos/ethermint/server.StartCmd.func2.2()
    github.com/evmos/ethermint/server/start.go:167 +0x54
github.com/evmos/ethermint/server.wrapCPUProfile(0x14001aeede0, 0x14000b4f9e0)
    github.com/evmos/ethermint/server/start.go:538 +0x238
github.com/evmos/ethermint/server.StartCmd.func2(0x14002022f08, {0x10c287268?, 0x0?, 0x0?})
    github.com/evmos/ethermint/server/start.go:166 +0x2cc
github.com/spf13/cobra.(*Command).execute(0x14002022f08, {0x10c287268, 0x0, 0x0})
    github.com/spf13/cobra/command.go:985 +0x834
github.com/spf13/cobra.(*Command).ExecuteC(0x14002016008)
    github.com/spf13/cobra/command.go:1117 +0x344
github.com/spf13/cobra.(*Command).Execute(...)
    github.com/spf13/cobra/command.go:1041
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    github.com/spf13/cobra/command.go:1034
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x14002016008, {0x107579094, 0x6}, {0x14001352d80, 0x16})
    github.com/cosmos/cosmos-sdk/server/cmd/execute.go:34 +0x154
main.main()
    github.com/crypto-org-chain/cronos/v2/cmd/cronosd/main.go:13 +0x3c

As you can see the errors come from ethermint. I will open an issue over there.

yihuang commented 2 weeks ago

I guess you can workaround by setting pruning=nothing for now, there seems to be some issues with the iavl v1's async pruning.

yihuang commented 2 weeks ago

Another suggestion is try to use memiavl. @alienc0der