Closed ToasterTheBrave closed 1 year ago
thanks for opening the issue, could you link me to your code in order to test out the changes
When upgrading from cosmos-sdk@v0.47.1 to cosmos-sdk@v0.47.2 and starting a new testnet, the testnet fails to start and produce blocks.
im running a testnet and upgraded things with no issue, i tried a few times if i missed something with no luck.
I started a testnet with 0.47.1 ran it a few blocks submitted txs then stopped it and upgraded to 0.47.2 and submitted more txs with success.
I tested with 0.47.0 then upgrade to 0.47.2 as well with blocks and transactions passing.
Could you walk me through what you did here?
Hi @tac0turtle! Thanks for looking into this.
I am starting a fresh paloma testnet with 0.47.2 and seeing these errors. The bootstrapping of the nodes is in an ansible runbook, but I will write up a gist to reproduce this.
Okay. So this seems to be a setup issue on the Paloma side of our private testnet. @ToasterTheBrave will close.
I've narrowed down exactly what's causing this. It appears to be a specific set of config around logging when starting up a new testnet. Specifically, I'm seeing that log_level="debug"
and log_format="json"
is breaking. I've got steps to reproduce using simapp.
Environment:
Reproduction steps:
# all hosts
KEY=simapp-$(hostname)
MONIKER=simapp-priv-testnet-$(hostname)
CHAIN=private-simapp-810226
simd init $MONIKER --chain-id $CHAIN
# This is done so we don't have to set a passphrase in the example. This has been confirmed
# not to change the example case
sed -i 's/keyring-backend = ".*"/keyring-backend = "test"/g' /root/.simapp/config/client.toml
# Here is the breaking config
sed -i 's/^log_format = ".*"/log_format = "json"/g' /root/.simapp/config/config.toml
sed -i 's/^log_level = ".*"/log_level = "debug"/g' /root/.simapp/config/config.toml
simd keys add $KEY
mkdir /root/.simapp/config/gentx
# m1
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
scp .simapp/config/genesis.json root@192.168.56.22:/root/.simapp/config/
scp .simapp/config/gentx/* root@192.168.56.22:/root/.simapp/config/gentx/
#m2
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
scp .simapp/config/genesis.json root@192.168.56.23:/root/.simapp/config/
scp .simapp/config/gentx/* root@192.168.56.23:/root/.simapp/config/gentx/
# m3
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
simd genesis collect-gentxs
scp .simapp/config/genesis.json root@192.168.56.21:/root/.simapp/config/
scp .simapp/config/genesis.json root@192.168.56.22:/root/.simapp/config/
# All hosts: get node id
simd tendermint show-node-id
# All hosts: Update the config.toml persistent_peers
# example values:
# m1: de66d98c8126f3853142be5c6917a0fc59dd8802@192.168.56.22:26656,5e71fddeb6e8b1e980f97a28ccd80a2cc893f18c@192.168.56.23:26656
# m2: 90f986726e6e9992c04c1b8715b6391928893453@192.168.56.21:26656,5e71fddeb6e8b1e980f97a28ccd80a2cc893f18c@192.168.56.23:26656
# m3: 90f986726e6e9992c04c1b8715b6391928893453@192.168.56.21:26656,de66d98c8126f3853142be5c6917a0fc59dd8802@192.168.56.22:26656
# All hosts: start the app
simd start
Now, if we check the status of simapp, it's not producing blocks. Ex:
$ simd status | jq
{
"NodeInfo": {
"protocol_version": {
"p2p": "8",
"block": "11",
"app": "0"
},
"id": "38d0ac703c2de98a4783a3b1d8a06c9678bf41de",
"listen_addr": "tcp://0.0.0.0:26656",
"network": "private-simapp-810226",
"version": "0.37.1",
"channels": "40202122233038606100",
"moniker": "simapp-priv-testnet-m1",
"other": {
"tx_index": "on",
"rpc_address": "tcp://127.0.0.1:26657"
}
},
"SyncInfo": {
"latest_block_hash": "",
"latest_app_hash": "",
"latest_block_height": "0",
"latest_block_time": "1970-01-01T00:00:00Z",
"earliest_block_hash": "",
"earliest_app_hash": "",
"earliest_block_height": "0",
"earliest_block_time": "1970-01-01T00:00:00Z",
"catching_up": false
},
"ValidatorInfo": {
"Address": "3D04865B31075A120B22C5B1911930BA4B538469",
"PubKey": {
"type": "tendermint/PubKeyEd25519",
"value": "03JnSP9fy2y/cJ7d8ScklYXUuaE8hZZzYOUkrrpOFf8="
},
"VotingPower": "10000"
}
}
@alexanderbez Should this be reopened give the above ^?
@julienrbrt are you able to double check this?
@julienrbrt and i were unable to reproduce this issue with a variety of configurations. closing this
What we can do is tagging a v0.47.3-rc.0 for you to try again. I still do not see how it can be logger related, but we have actually changed the logger in v0.47.3. My trials were indeed working.
@julienrbrt I'll test out the new version. In your trials, were you using three nodes? I'm curious what could possibly be different between our example setups.
@julienrbrt I tested in v0.47.3. This is still a problem. I need more info about how you ran your tests that didn't produce the error so I can give more context if needed.
@mmsqe Thank you for the comment! Nice to know I'm not going insane and others have seen the issue :slightly_smiling_face:. It looks like you've already opened a ticket with cometbft but didn't get any traction?
@mmsqe Thank you for the comment! Nice to know I'm not going insane and others have seen the issue :slightly_smiling_face:. It looks like you've already opened a ticket with cometbft but didn't get any traction?
I think they see the issue, but need more test.
In your trials, were you using three nodes?
I was using in-process nodes with a tweaked simd testnet
with a debug logger, so possibly this was that.
Possibly, the new cometbft release resolves your issue when used with v0.47.3: https://github.com/cometbft/cometbft/blob/v0.37.2/CHANGELOG.md#security-fixes
@julienrbrt Yes indeed, this is fixed for us now with cometbft v0.37.2 :slightly_smiling_face:
Summary of Bug
When upgrading from cosmos-sdk@v0.47.1 to cosmos-sdk@v0.47.2 and starting a new testnet, the testnet fails to start and produce blocks.
The same steps worked previously in cosmos-sdk@v0.47.1. After the upgrade, no blocks are being created and no obvious error messages are being produced. The best I have is the output of the status.
There are occasionally error messages, but they don't seem overly concerning as it's a host trying to add itself to the address book.
Version
v0.47.2
Steps to Reproduce
This was found in palomachain. To reproduce, upgrade cosmos-sdk and cometbtf there and start up a new testnet. I suspect it may be reproducible in other cosmos apps, but I have not tried.
To further show issues, you can try submitting a proposal. We are submitting as such:
And it returns