Closed bhyzy closed 2 years ago
Just for some context, we have already rolled out this fix to our users and have been seeing this bug occur roughly 15,000 times/day across our entire user base. We have also been able to consistently reproduce it by enabling Network Link Conditioner on iOS and setting it to Very Bad Network. Our hypothesis is that substantial packet dropping on poor connections is the key ingredient here.
@bhyzy Glad this worked out for you! Code looks good so far. I'll update here when I get a chance to test this. Thanks
Problem
Socket
does not correctly reconnect if the server fails to respond to a heartbeat in time, but acknowledges the subsequent client-initiated disconnection request.Root cause analysis
Every
heartbeatInterval
the socket tries to push a heartbeat message to the server (sendHeartbeat
). If a response to the previous one has not been received yet (pendingHeartbeatRef
not nil), a timeout is proclaimed. This in turn triggers an abnormal closure of the socket:abnormalClose("heartbeat timeout")
:The socket then closes the connection to the server with
CloseCode.normal
(1000):The transport layer (
self.connection
) simply drops the connection (showingURLSessionTransport
):Two things can happen now:
Server is unreachable: the disconnection request is not processed, the transport layer reports an abnormal closure to the socket (
self.delegate
):Server is reachable: the disconnection request is processed, the transport layer reports a closure event back to the socket (
self.delegate
), passing in the close code that was sent to the server (1000 =CloseCode.normal
):Both of these scenarios end up calling the
Socket.onClose
method:In scenario 1.
self.closeWasClean
is set tofalse
. In scenario 2. though, it's set totrue
, overriding thefalse
value set earlier inabnormalClose
. This in turn causes the reconnection logic to be skipped:Solution
The proposed solution introduces an enum
CloseStatus
to be used instead of a booleancloseWasClean
, which allows the abnormal closure information to be sticky and to avoid it being overridden totrue
.