Root cause: client_ctx::alive handling and BoxFort timeout conflict
To indicate that a client is dead, a death message has to be sent to the runner, and that message needs to be the very last message. This is done correctly.
The problem is handle_birth(), which sets the alive flag to true. Under rare circumstances (overloaded system), the handle_birth() action may never run due to the test timeout that occurs before sending out the birth message. This scenario leaves the alive flag on false, causing a deadlock:
Root cause:
client_ctx::alive
handling and BoxFort timeout conflictTo indicate that a client is dead, a death message has to be sent to the runner, and that message needs to be the very last message. This is done correctly.
The problem is
handle_birth()
, which sets thealive
flag to true. Under rare circumstances (overloaded system), thehandle_birth()
action may never run due to the test timeout that occurs before sending out the birth message. This scenario leaves thealive
flag on false, causing a deadlock:The death callback contains a
cr_send_to_runner()
invocation, which waits for the main message loop to send an ack, which will never happen.