VolumeFi / paloma

The fast blockchain messenger protocol
Apache License 2.0
3 stars 2 forks source link

[BUG] Operator keys #991

Closed byte-bandit closed 9 months ago

byte-bandit commented 9 months ago

Collection of ongoing issues in test net

Since moving on from private test net to public test net, some new issues have appeared that we weren't aware of before.

[birdlady] Account Sequence Mismatch

It appears that birdlady is experiencing account sequence mismatch errors continuously:

Nov 13 16:21:57 testnet-birdlady pigeon[2255698]: {"component":"try-send-keep-alive","error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-13T16:21:57Z","x-correlation-id":"cl94q3ul8ot9qeig1jag"}
Nov 13 16:21:57 testnet-birdlady pigeon[2255698]: {"error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"warning","msg":"keep alive: Attempt [3/3] failed to send keep alive call. Will retry in 4s","time":"2023-11-13T16:21:57Z","x-correlation-id":"cl94q3ul8ot9qeig1jag"}
Nov 13 16:22:01 testnet-birdlady pigeon[2255698]: {"component":"procmon","error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"error","msg":"Failed to execute process: rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","process":"Keep alive","time":"2023-11-13T16:22:01Z","x-correlation-id":"cl94q3ul8ot9qeig1jag"}
Nov 13 16:22:01 testnet-birdlady pigeon[2255698]: {"component":"cache","error":null,"estimatedBlockSpeed":1957000000,"lastAliveUntil":7865519,"lastBlockHeight":7866470,"lastRefresh":"2023-11-13T16:22:01.642873115Z","level":"debug","msg":"cache refreshed","time":"2023-11-13T16:22:01Z","x-correlation-id":"cl94qael8ot9qeig1jdg"}
Nov 13 16:22:07 testnet-birdlady pigeon[2255698]: {"component":"try-send-keep-alive","error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 10, got 9: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-13T16:22:07Z","x-correlation-id":"cl94qael8ot9qeig1jdg"}
Nov 13 16:22:07 testnet-birdlady pigeon[2255698]: {"error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 10, got 9: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"warning","msg":"keep alive: Attempt [1/3] failed to send keep alive call. Will retry in 1s","time":"2023-11-13T16:22:07Z","x-correlation-id":"cl94qael8ot9qeig1jdg"}
Nov 13 16:22:15 testnet-birdlady pigeon[2255698]: {"component":"try-send-keep-alive","error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-13T16:22:15Z","x-correlation-id":"cl94qael8ot9qeig1jdg"}
Nov 13 16:22:15 testnet-birdlady pigeon[2255698]: {"error":"rpc error: code = Unknown desc = rpc error: code = Unknown desc = account sequence mismatch, expected 7, got 6: incorrect account sequence [cosmos/cosmos-sdk@v0.47.4/x/auth/ante/sigverify.go:269] With gas wanted: '18446744073709551615' and gas used: '15991' : unknown request","level":"warning","msg":"keep alive: Attempt [2/3] failed to send keep alive call. Will retry in 2s","time":"2023-11-13T16:22:15Z","x-correlation-id":"cl94qael8ot9qeig1jdg"}

This is odd, especially as all operator keys have been used before:

root@testnet-birdlady:~# palomad q account paloma1vv847fzh7l89gkfgcact26hqvrpqqs3vlpfwet
'@type': /cosmos.auth.v1beta1.BaseAccount
account_number: "534"
address: paloma1vv847fzh7l89gkfgcact26hqvrpqqs3vlpfwet
pub_key:
  '@type': /cosmos.crypto.secp256k1.PubKey
  key: AnYMS6+eZ4UHHJK+4Uj84sih9e+MqCVIj01tBsJ5CKYq
sequence: "6"
root@testnet-birdlady:~# palomad q account paloma1zjugeq7sm3gd3fyt5el2f26c0hhh5xkaq5jll8
'@type': /cosmos.auth.v1beta1.BaseAccount
account_number: "535"
address: paloma1zjugeq7sm3gd3fyt5el2f26c0hhh5xkaq5jll8
pub_key:
  '@type': /cosmos.crypto.secp256k1.PubKey
  key: A4LNMsZuEDxjMoKVr2BGj2Oh7ZzPxKDKWp4aBQSpwb6E
sequence: "9"
root@testnet-birdlady:~# palomad q account paloma1u2wqk0lskn52h365yp65w0sn5n3wahfw0ze5qy
'@type': /cosmos.auth.v1beta1.BaseAccount
account_number: "536"
address: paloma1u2wqk0lskn52h365yp65w0sn5n3wahfw0ze5qy
pub_key:
  '@type': /cosmos.crypto.secp256k1.PubKey
  key: Ai0gmZJ1zM6+qYOM/ByFcDYMt1dwgl7zCEc5mDPAaHi8
sequence: "6"

At some point, this got out of sync. Why doesn't it recover?

Investigating the implementation

[BirdPoop2] Keys not found

[!NOTE]
This has been resolved. The issue was a difference in keyring backend configurations for Pigeon and Paloma.

BP2 isn't able to read in the validator keys, although the setup was performed identical to birdlady, where keys can be read in without issues. I suspect this might be caused by the difference in keystore backends (test vs os), although judging from the code, there should be absolutely zero difference between reading in the validator key (which works) and the operator keys (which remain as not found).

I'm not yet sure whether this is an issue with the setup or a bug in the code.

verabehr commented 9 months ago

@byte-bandit on BP2 the keystore is set to test in ~/.paloma/config/client.toml but to os in ~/.pigeon/config/config.yaml. Just wanted to point that out in case this wasn't intentional and part of your testing/trouble shooting.

byte-bandit commented 9 months ago

@byte-bandit on BP2 the keystore is set to test in ~/.paloma/config/client.toml but to os in ~/.pigeon/config/config.yaml. Just wanted to point that out in case this wasn't intentional and part of your testing/trouble shooting.

@verabehr just saved the day. Yes, Pigeon obviously won't find the keys in the os store if they're created in the test store on Paloma. I adapted the configuration and it's working as expected with both apps running against the test store.

I'm tempted to try running against the os store, but will likely have to regenerate all keys (including the validator key?)

verabehr commented 9 months ago

Next steps:

  1. Reach out to cometbft with a ticket
  2. Vitwit to start upgrade of cosmos and comet
taariq commented 9 months ago

@byte-bandit let's open up a ticket with comet BFT and close as we will update when they respond, but punt this over to Upgrade for Cosmos-SDK.

byte-bandit commented 9 months ago

Log output from BL after restarting with fresh keys

After switching to a fresh set of signing keys, you can see that each transaction signed by the keys in question fails to get included in the block within the timeout. Once all 3 keys have been used, the account sequence mismatch starts again. That makes sense, as now there's 3 more transactions stuck in the mempool.

There is no indication as to WHY they're not being included. The fact that I was able to reproduce this behaviour so easily is interesting, as the same cannot be said for either node on private test net or BirdPoop2, who has used its signing keys over 400 times by now, without issue.

The underlaying issue must have something to do with the node configuration I guess.

root@testnet-birdlady:~# journalctl -u pigeond --since "5 minutes ago" | grep clblr3ul8ot9jaq5cclg
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"Heart.Beat","level":"debug","msg":"Running heartbeat","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","estimatedBlockSpeed":0,"lastAliveUntil":0,"lastBlockHeight":0,"lastRefresh":"0001-01-01T00:00:00Z","level":"debug","msg":"cache is stale","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","level":"debug","msg":"refreshing cache","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"level":"info","msg":"Client: Status","status":{"node_info":{"protocol_version":{"p2p":8,"block":11,"app":0},"id":"aff8933a73729946b38729979093c0fc013b8cc8","listen_addr":"tcp://0.0.0.0:26656","network":"paloma-testnet-15","version":"0.37.2","channels":"40202122233038606100","moniker":"birdlady","other":{"tx_index":"on","rpc_address":"tcp://0.0.0.0:26657"}},"sync_info":{"latest_block_hash":"3304F5EA54FFED101BAF5797079BA37F0957510F0F0526B60AEC5A8EEA35E8F1","latest_app_hash":"13E0AD336EC26E5D7CE15B849561D7A8A1F591FB89362D84600738194078EC50","latest_block_height":8042736,"latest_block_time":"2023-11-17T12:33:16.342763397Z","earliest_block_hash":"91FCB5D26EA2BFF314C180A5B95BF1CF29C6063D2E18004025EFAE70146D69A0","earliest_app_hash":"5E7B1E879FB83FD4C1C614CC633EB570096577150A2A8BAD43B87A391EF1C0B6","earliest_block_height":1144001,"earliest_block_time":"2023-04-05T02:33:00.569763082Z","catching_up":false},"validator_info":{"address":"27479561598973D3A71FAFEC0AD9CB263E46F813","pub_key":"VLDjjeisPKInQm9m+Az3yRDBZqMVHIgg6/Uer+9jpck=","voting_power":0}},"time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","level":"debug","msg":"done refreshing cache","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"cache","error":null,"estimatedBlockSpeed":1620000000,"lastAliveUntil":7865519,"lastBlockHeight":8042736,"lastRefresh":"2023-11-17T12:33:19.119616974Z","level":"debug","msg":"cache refreshed","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"alive-until-bh":7865519,"btl":-177217,"component":"Heart.Beat","current-bh":8042736,"level":"debug","msg":"checking keep alive","should-send-keep-alive":true,"time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1nt5pfp2mjmeju75zpdeph5tuetdvn8crxnk2y3","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:33:19 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-echo","level":"info","msg":"signing transaction","time":"2023-11-17T12:33:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:19 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:34:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:19 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [1/3] failed to send keep alive call. Will retry in 1s","time":"2023-11-17T12:34:19Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:20 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1s8tqc0nlcn9wp8tc862awkg6szsvprcs4y2wrz","time":"2023-11-17T12:34:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:34:20 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-foxtrott","level":"info","msg":"signing transaction","time":"2023-11-17T12:34:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:20 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:35:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:20 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [2/3] failed to send keep alive call. Will retry in 2s","time":"2023-11-17T12:35:20Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:22 testnet-birdlady pigeon[2322371]: {"component":"message-sender","creator":"paloma18d4cfq63h4a9cz49d86pr84hpang7yvqq9ucwf","level":"debug","msg":"Injecting metadata","signer":"paloma1vd5amwccdhsh7nfuyaefnxqjxt4mv8v4h4q456","time":"2023-11-17T12:35:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:35:22 testnet-birdlady pigeon[2322371]: {"component":"send-msgs","key":"pigeon-operator-delta","level":"info","msg":"signing transaction","time":"2023-11-17T12:35:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:36:22 testnet-birdlady pigeon[2322371]: {"component":"try-send-keep-alive","error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"error","msg":"Error while trying to keep pigeon alive.","time":"2023-11-17T12:36:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
Nov 17 12:36:22 testnet-birdlady pigeon[2322371]: {"error":"timed out after: 60000000000; timed out after waiting for tx to get included in the block","level":"warning","msg":"keep alive: Attempt [3/3] failed to send keep alive call. Will retry in 4s","time":"2023-11-17T12:36:22Z","x-correlation-id":"clblr3ul8ot9jaq5cclg"}
image
byte-bandit commented 9 months ago

Observation from @verabehr:

Both nodes have different configurations in gas adjustment multipliers

byte-bandit commented 9 months ago

Still waiting to hear back from anyone over at CometBFT who might know more https://github.com/cometbft/cometbft/discussions/1623

verabehr commented 9 months ago

closing we have #1013 and #1014 for tracking