Whenever proxy or handler want to hand off a session, they send a handoff-start message to connmgr, and connmgr replies with handoff-proceed. After this, connmgr is supposed to stay silent until it receives another message, at which point it assumes whoever sent the message is the new owner of the session and communication can resume as normal. However, in rare cases connmgr might send keep-alive messages to the old owner while it is waiting to hear from the new owner, leading to broken sessions and "received message out of sequence" warnings. This PR fixes connmgr to not send keep-alives during handoff.
There are two fixes:
In accept_handoff(), for both server and client mode, unset the to_addr (i.e. the known peer address) of the session. Keep-alives are only sent for sessions with a known peer address. This is an easy one line fix per mode.
In keep_alives_task(), for both server and client mode, account for the possibility that to_addr might get unset for sessions during batch preparation and processing. This is a bit more complicated, see below.
The keep_alives_task() tries to address keep-alive messages to multiple sessions at once, to reduce the number of messages sent. It does this by processing sessions in batches. It first decides which sessions it will be sending keep-alives for, and then it proceeds to generate and send the minimal number of messages with optimal packing. Notably, it waits for message sendability before generating and sending each message, and we need to tolerate to_addr getting unset for any sessions during this waiting. We do this by making it possible for Batch::take_group() to skip sessions via its get_id closure argument and updating the closure provided by next_batch_message() to skip sessions that don't have a to_addr.
The Batch type, including tests, is redundantly implemented twice (once in connmgr::client and once in connmgr::server). This PR updates both copies of the code. In the future we should consider sharing a single implementation of Batch to avoid having to make redundant updates.
Whenever proxy or handler want to hand off a session, they send a
handoff-start
message to connmgr, and connmgr replies withhandoff-proceed
. After this, connmgr is supposed to stay silent until it receives another message, at which point it assumes whoever sent the message is the new owner of the session and communication can resume as normal. However, in rare cases connmgr might send keep-alive messages to the old owner while it is waiting to hear from the new owner, leading to broken sessions and "received message out of sequence" warnings. This PR fixes connmgr to not send keep-alives during handoff.There are two fixes:
accept_handoff()
, for both server and client mode, unset theto_addr
(i.e. the known peer address) of the session. Keep-alives are only sent for sessions with a known peer address. This is an easy one line fix per mode.keep_alives_task()
, for both server and client mode, account for the possibility thatto_addr
might get unset for sessions during batch preparation and processing. This is a bit more complicated, see below.The
keep_alives_task()
tries to address keep-alive messages to multiple sessions at once, to reduce the number of messages sent. It does this by processing sessions in batches. It first decides which sessions it will be sending keep-alives for, and then it proceeds to generate and send the minimal number of messages with optimal packing. Notably, it waits for message sendability before generating and sending each message, and we need to tolerateto_addr
getting unset for any sessions during this waiting. We do this by making it possible forBatch::take_group()
to skip sessions via itsget_id
closure argument and updating the closure provided bynext_batch_message()
to skip sessions that don't have ato_addr
.The
Batch
type, including tests, is redundantly implemented twice (once inconnmgr::client
and once inconnmgr::server
). This PR updates both copies of the code. In the future we should consider sharing a single implementation ofBatch
to avoid having to make redundant updates.