anoma / namada

Rust implementation of Namada, a Proof-of-Stake L1 for interchain asset-agnostic privacy
https://namada.net
GNU General Public License v3.0
2.4k stars 960 forks source link

display actionable error message when sync does not start #4042

Open egasimus opened 1 week ago

egasimus commented 1 week ago

Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided persistent_peers.

INFO namada_node::shell::init_chain: Crediting X nam tokens to Y
INFO namada_node::shell::init_chain: Crediting A nam tokens to Z
# ... repeat for thousands of lines ...
# and then crickets 🦗🦗🦗

These have different root causes, yet in all cases there is zero feedback from namadan as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.

It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.

Looking at the way run_aux launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from the crediting tokens messages.

sug0 commented 1 week ago

All network code is handled by CometBFT. Namada hides its output by default, but you can export NAMADA_CMT_STDOUT=true and CMT_LOG_LEVEL=info or CMT_LOG_LEVEL=debug to see what's going on at the P2P level. Be warned that setting CometBFT's log level to debug generates incredibly noisy output.