Concordium / Testnet4-Challenges

Creative Commons Attribution Share Alike 4.0 International
95 stars 642 forks source link

Unclean shutdown of node during SIGINT handling #717

Closed mjptree closed 3 years ago

mjptree commented 3 years ago

Closing the node manually with ctrl-c has lead to panic in main, caused by an unwrapped none. Could be desired functionality, could be a hint to a problem. The logs of the node where:

2021-03-12 19:09:29,474 WARN received SIGINT indicating exit request
2021-03-12 19:09:29,490 INFO waiting for collector, concordium-node to die
2021-03-12T19:09:29.498537800Z: INFO: Signal received attempting to shutdown node cleanly
2021-03-12 19:09:32,503 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:35,507 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:38,511 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:41,515 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:44,519 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:47,523 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:50,528 INFO waiting for collector, concordium-node to die
2021-03-12 19:09:53,532 INFO waiting for collector, concordium-node to die
2021-03-12T19:09:56.190876600Z: INFO: Shutting down
concordium-node: getConsensusStatus: interrupted
2021-03-12 19:09:56,850 INFO waiting for collector, concordium-node to die
2021-03-12T19:09:56.933289500Z: INFO: P2PNode gracefully closed.
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/thread/mod.rs:1361:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2021-03-12 19:09:56,993 INFO stopped: concordium-node (exit status 0)
2021-03-12 19:09:58,004 INFO stopped: collector (terminated by SIGTERM)
concordium-node exiting
abizjak commented 3 years ago

Thanks for the report. We have seen similar issues before.

In the testnet 4 node there are some race conditions in the shutdown sequence that occassionally lead to panics like this. We believe we have resolved them since the testnet 4 node was released so they should not happen anymore with the current (as of yet unreleased) node.