Closed byte-bandit closed 9 months ago
@byte-bandit on BP2 the keystore is set to test
in ~/.paloma/config/client.toml
but to os
in ~/.pigeon/config/config.yaml
. Just wanted to point that out in case this wasn't intentional and part of your testing/trouble shooting.
@byte-bandit on BP2 the keystore is set to
test
in~/.paloma/config/client.toml
but toos
in~/.pigeon/config/config.yaml
. Just wanted to point that out in case this wasn't intentional and part of your testing/trouble shooting.
@verabehr just saved the day. Yes, Pigeon obviously won't find the keys in the os
store if they're created in the test
store on Paloma. I adapted the configuration and it's working as expected with both apps running against the test
store.
I'm tempted to try running against the os
store, but will likely have to regenerate all keys (including the validator key?)
Next steps:
@byte-bandit let's open up a ticket with comet BFT and close as we will update when they respond, but punt this over to Upgrade for Cosmos-SDK.
After switching to a fresh set of signing keys, you can see that each transaction signed by the keys in question fails to get included in the block within the timeout. Once all 3 keys have been used, the account sequence mismatch starts again. That makes sense, as now there's 3 more transactions stuck in the mempool.
There is no indication as to WHY they're not being included. The fact that I was able to reproduce this behaviour so easily is interesting, as the same cannot be said for either node on private test net or BirdPoop2, who has used its signing keys over 400 times by now, without issue.
The underlaying issue must have something to do with the node configuration I guess.
root@testnet-birdlady:~# journalctl -u pigeond --since "5 minutes ago" | grep clblr3ul8ot9jaq5cclg
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"Heart.Beat","level":"debug","msg":"Running heartbeat","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","estimatedBlockSpeed":0,"lastAliveUntil":0,"lastBlockHeight":0,"lastRefresh":"0001-01-01T00:00:00Z","level":"debug","msg":"cache is stale","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","level":"debug","msg":"refreshing cache","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"level":"info","msg":"Client: Status","status":{"node_info":{"protocol_version":{"p2p":8,"block":11,"app":0},"id":"aff8933a73729946b38729979093c0fc013b8cc8","listen_addr":"tcp://0.0.0.0:26656","network":"paloma-testnet-15","version":"0.37.2","channels":"40202122233038606100","moniker":"birdlady","other":{"tx_index":"on","rpc_address":"tcp://0.0.0.0:26657"}},"sync_info":{"latest_block_hash":"3304F5EA54FFED101BAF5797079BA37F0957510F0F0526B60AEC5A8EEA35E8F1","latest_app_hash":"13E0AD336EC26E5D7CE15B849561D7A8A1F591FB89362D84600738194078EC50","latest_block_height":8042736,"latest_block_time":"2023-11-17T12:33:16.342763397Z","earliest_block_hash":"91FCB5D26EA2BFF314C180A5B95BF1CF29C6063D2E18004025EFAE70146D69A0","earliest_app_hash":"5E7B1E879FB83FD4C1C614CC633EB570096577150A2A8BAD43B87A391EF1C0B6","earliest_block_height":1144001,"earliest_block_time":"2023-04-05T02:33:00.569763082Z","catching_up":false},"validator_info":{"address":"27479561598973D3A71FAFEC0AD9CB263E46F813","pub_key":"VLDjjeisPKInQm9m+Az3yRDBZqMVHIgg6/Uer+9jpck=","voting_power":0}},"time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","level":"debug","msg":"done refreshing cache","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","error":null,"estimatedBlockSpeed":1620000000,"lastAliveUntil":7865519,"lastBlockHeight":8042736,"lastRefresh":"2023-11-17T12:33:19.119616974Z","level":"debug","msg":"cache refreshed","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"alive-until-bh":7865519,"btl":-177217,"component":"Heart.Beat","current-bh":8042736,"level":"debug","msg":"checking keep alive","should-send-keep-alive":true,"time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1nt5pfp2mjmeju75zpdeph5tuetdvn8crxnk2y3","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-echo","level":"info","msg":"signing transaction","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:19 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:34:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:19 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [1/3] failed to send keep alive call. Will retry in 1s","time":"2023-11-17T12:34:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:20 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1s8tqc0nlcn9wp8tc862awkg6szsvprcs4y2wrz","time":"2023-11-17T12:34:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:20 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-foxtrott","level":"info","msg":"signing transaction","time":"2023-11-17T12:34:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:20 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:35:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:20 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [2/3] failed to send keep alive call. Will retry in 2s","time":"2023-11-17T12:35:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:22 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1vd5amwccdhsh7nfuyaefnxqjxt4mv8v4h4q456","time":"2023-11-17T12:35:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:22 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-delta","level":"info","msg":"signing transaction","time":"2023-11-17T12:35:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:36:22 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:36:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:36:22 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [3/3] failed to send keep alive call. Will retry in 4s","time":"2023-11-17T12:36:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Observation from @verabehr:
Both nodes have different configurations in
gas adjustment
multipliers
Still waiting to hear back from anyone over at CometBFT who might know more https://github.com/cometbft/cometbft/discussions/1623
closing we have #1013 and #1014 for tracking
Collection of ongoing issues in test net
Since moving on from private test net to public test net, some new issues have appeared that we weren't aware of before.
[birdlady] Account Sequence Mismatch
It appears that birdlady is experiencing account sequence mismatch errors continuously:
This is odd, especially as all operator keys have been used before:
At some point, this got out of sync. Why doesn't it recover?
Investigating the implementation
[BirdPoop2] Keys not found
BP2 isn't able to read in the validator keys, although the setup was performed identical to birdlady, where keys can be read in without issues. I suspect this might be caused by the difference in keystore backends (
test
vsos
), although judging from the code, there should be absolutely zero difference between reading in the validator key (which works) and the operator keys (which remain as not found).I'm not yet sure whether this is an issue with the setup or a bug in the code.