In practice, connection instantiation can block indefinitely while waiting for an identify response and so we introduce a timeout here. This manifested in a couple of ways, but most stellarly, we encountered instances where all connections died and the connection checker was blocked on such a connection. This effectively kills the consumer while busy-waiting.
The second issue is that we weren't resetting all stateful variables in a connection when re-establishing a connection. Of particular note was Connection._buffer, the connection's read buffer. This meant that when re-establishing a connection, we'd sometimes have un-consumed responses from the server in the read buffer, which would get popped off before reading a legitimate identify response. In practice, it meant we misidentified servers as not supporting feature negotiation (OK responses were interpreted as identify responses).
In practice, connection instantiation can block indefinitely while waiting for an identify response and so we introduce a timeout here. This manifested in a couple of ways, but most stellarly, we encountered instances where all connections died and the connection checker was blocked on such a connection. This effectively kills the consumer while busy-waiting.
The second issue is that we weren't resetting all stateful variables in a connection when re-establishing a connection. Of particular note was
Connection._buffer
, the connection's read buffer. This meant that when re-establishing a connection, we'd sometimes have un-consumed responses from the server in the read buffer, which would get popped off before reading a legitimate identify response. In practice, it meant we misidentified servers as not supporting feature negotiation (OK
responses were interpreted as identify responses).