We encountered an issue where cometbft crashes with a panic caught in fendermint. This issue occurs because, in BeginBlock, we attempt to resolve the CometBFT validator ID to a public key. However, when fendermint’s data folder is deleted and fendermint is restarted, cometbft attempts to start block replay but is not ready for the RPC API connection that fendermint requires for this process.
2024-11-06T14:13:51.762219Z ERROR fendermint/abci/src/application.rs:212: failed to execute ABCI request: Error { msg: "HTTP error", source: "error trying to connect: tcp connect error: Connection refused (os error 61)", } thread 'tokio-runtime-worker' panicked at /Users/alexei/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-abci-0.7.0/src/v037/server.rs:145:70: called Result::unwrap() on an Err value: HTTP error
Caused by: error trying to connect: tcp connect error: Connection refused (os error 61)
Location: /Users/alexei/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10:9
Caused by: 0: HTTP error 1: error trying to connect: tcp connect error: Connection refused (os error 61) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace 2024-11-06T14:13:51.995565Z ERROR fendermint/app/src/main.rs:24: panicking stacktrace=" 0: std::backtrace_rs::backtrace::libunwind::trace\n
Cause:
The issue seems to be due to this line in validators.rs, where fendermint tries to resolve the validator ID to a public key by connecting to the cometbft RPC API during BeginBlock. If cometbft is not fully ready (due to replay or a fresh start with deleted data), this connection fails, causing fendermint to panic and terminate.
Description:
We encountered an issue where
cometbft
crashes with a panic caught infendermint
. This issue occurs because, inBeginBlock
, we attempt to resolve the CometBFT validator ID to a public key. However, whenfendermint
’sdata
folder is deleted andfendermint
is restarted,cometbft
attempts to start block replay but is not ready for the RPC API connection thatfendermint
requires for this process.Steps to Reproduce:
cometbft
andfendermint
.fendermint
and delete itsdata
folder.fendermint
.Observed Errors:
cometbft
Logs Before Crash:fendermint
Panic:Cause:
The issue seems to be due to this line in
validators.rs
, wherefendermint
tries to resolve the validator ID to a public key by connecting to thecometbft
RPC API duringBeginBlock
. Ifcometbft
is not fully ready (due to replay or a fresh start with deleted data), this connection fails, causingfendermint
to panic and terminate.