davidMcneil / rants

An async NATS client library for the Rust programming language.
Apache License 2.0
81 stars 11 forks source link

Formally disconnect when the decoder stream encounters an IO error #24

Closed christophermaier closed 4 years ago

christophermaier commented 4 years ago

When connecting to a NATS server from Windows, if the connection to the NATS server unceremoniously disappears (e.g., somebody yanks the network cable out of the machine), rants would go into an infinite loop, repeatedly logging the following, while maxing out the CPU:

TCP socket error, err: An existing connection was forcibly closed by the remote host. (os error 10054)

When processing the call to reader.next() in the main message handling loop, an error of this kind would be treated as recoverable, and we would continue into the loop again. reader.next() would immediately return the same error again, leading to an infinte loop.

Interestingly, this does not appear to affect Linux clients; monitoring the socket with netstat shows that the connection still shows up as ESTABLISHED. On Windows, on the other hand, the socket quickly disappears from netstat's output once the server's network connection drops.

The core of this fix is basically changing that continue to a break, meaning that we treat such an error as a disconnection. With this change, we avoid the infinite loop.

Since the returned value from the decoder stream is a deeply nested type with multiple Result layers, and since the deconstruction of that type was repeated in multiple places, I encoded the logic in a separate "disposition" Enum, which should hopefully make reasoning about the various cases a bit easier.

Signed-off-by: Christopher Maier christopher.maier@gmail.com

ericcalabretta commented 4 years ago

@christophermaier Will the supervisor still be able to reconnect after the connection to automate is restored? I'm assuming it'll still attempt to connect for each health-check.