celestiaorg / celestia-app

Celestia consensus node
https://celestiaorg.github.io/celestia-app/
Apache License 2.0
345 stars 292 forks source link

querying empty signal tally causes index panic #4007

Open cmwaters opened 4 weeks ago

cmwaters commented 4 weeks ago

Found by using the following command on arabica:

❯ celestia-appd query signal tally 3 --node https://rpc.celestia-arabica-11.com:443 Error: rpc error: code = Unknown desc = runtime error: index out of range [7] with length 7: panic

rootulp commented 3 days ago

hmm, I can't repro with that exact command or a few variants of that command:

celestia-appd query signal tally 3 --node https://celestia-mainnet-rpc.itrocket.net:443
celestia-appd query signal tally 4 --node https://rpc.celestia-arabica-11.com:443
celestia-appd query signal tally 5 --node https://rpc.celestia-arabica-11.com:443

I think we'll need to write unit tests against the RPC / gRPC endpoint to try and repro

rootulp commented 3 days ago

Oh it's possible that this node was the Arabica node that DevOps accidentally downgraded after upgrading to v3.x and submitting the signal message. See Slack thread for context.

rootulp commented 3 hours ago

I reproed this by running single-node.sh from https://github.com/celestiaorg/celestia-app/pull/4041 and then ./scripts/upgrade-to-v3.sh

# Query during app version 1
$ celestia-appd query signal tally 3
Error: rpc error: code = Unknown desc = kv store with key KVStoreKey{0x14001209a90, signal} has not been registered in stores: panic

# Query during app version 2
$ celestia-appd query signal tally 3
threshold_power: "4167"
total_voting_power: "5000"
voting_power: "0"

# Query after validator signals for v3
$ celestia-appd query signal tally 3
threshold_power: "4167"
total_voting_power: "5000"
voting_power: "5000"

# Query after the try upgrade but before v3 activation height
$ celestia-appd query signal tally 3
Error: rpc error: code = Unknown desc = runtime error: index out of range [7] with length 6: panic
rootulp commented 2 hours ago

The upgrade key is getting iterated over when calculating the tally. I added a unit test and fix.