If there are active queries while the node receives the signal to shut down (SIGINT to be precise) then it generally terminates via a segmentation fault.
2021-10-04T19:02:58.078279751Z: DEBUG: Drained the Consensus outbound low priority queue for 1 element(s)
2021-10-04T19:02:58.078294841Z: DEBUG: Drained the Consensus outbound high priority queue for 0 element(s)
2021-10-04T19:02:58.078299221Z: DEBUG: Drained the Consensus inbound low priority queue for 1 element(s)
2021-10-04T19:02:58.078302781Z: DEBUG: Drained the Consensus inbound high priority queue for 0 element(s)
concordium-node: concordium-node: getBlockSummary: interruptedconcordium-node: getBlockSummary: interrupted
getBlockSummary: interrupted
concordium-node: getBlockSummary: interrupted
concordium-node: getBlockSummary: interrupted
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown
[1] 467136 abort (core dumped) cargo run --release -- --bootstrap-node bootstrap.testnet.concordium.com:8888
The reason the "interrupted" messages are there is that the Haskell runtime is shut down (via hs_exit) while there are active Haskell computations (the queries). This is fine, albeit not pretty.
The segmentation fault seems to happen because some Haskell functions are called (some queries) after hs_exit is called, which is a violation.
To fix this we need to wait with shutting down the Haskell runtime until after the RPC server has been shut down (if it is alive in the first place).
Steps to Reproduce
Run a node.
Make queries against the node, e.g., via concordium client or similar.
Shut down the node while an active query is in progress. This is easiest to achieve with a slow query such as block summary.
Expected Result
Queries are cancelled or completed, the node shuts down normally.
Actual Result
The node triggers a segmentation fault in most cases.
Bug Description
If there are active queries while the node receives the signal to shut down (SIGINT to be precise) then it generally terminates via a segmentation fault.
The reason the "interrupted" messages are there is that the Haskell runtime is shut down (via hs_exit) while there are active Haskell computations (the queries). This is fine, albeit not pretty.
The segmentation fault seems to happen because some Haskell functions are called (some queries) after
hs_exit
is called, which is a violation.To fix this we need to wait with shutting down the Haskell runtime until after the RPC server has been shut down (if it is alive in the first place).
Steps to Reproduce
Run a node. Make queries against the node, e.g., via concordium client or similar. Shut down the node while an active query is in progress. This is easiest to achieve with a slow query such as block summary.
Expected Result
Queries are cancelled or completed, the node shuts down normally.
Actual Result
The node triggers a segmentation fault in most cases.
Versions