This resolves an issue in which a channel.recv() operation can hang when the host's sshd is alive but netconfd has hung or fallen over. The TCP socket and Paramiko Transport do not time out because data is still flowing, but the NETCONF connection hangs. We have observed a simple NETCONF HELLO exchange to hang for over 2.5 hours in the wild despite the socket timeout being set.
Paramiko channels, it turns out, have their own timeout that is independent of the socket/transport timeout. So when we call sock.recv(1024) in parse_messages(), sock is not a socket.Socket as one might suppose, nor is it governed by the timeouts set in connect_ssh().
This resolves an issue in which a
channel.recv()
operation can hang when the host'ssshd
is alive butnetconfd
has hung or fallen over. The TCP socket and Paramiko Transport do not time out because data is still flowing, but the NETCONF connection hangs. We have observed a simple NETCONF HELLO exchange to hang for over 2.5 hours in the wild despite the socket timeout being set.Paramiko channels, it turns out, have their own timeout that is independent of the socket/transport timeout. So when we call
sock.recv(1024)
inparse_messages()
,sock
is not asocket.Socket
as one might suppose, nor is it governed by the timeouts set inconnect_ssh()
.Cross-references:
Paramiko docs Paramiko source Related issue
(Turns out I considered doing this in #28 but it didn't fix what I was chasing at the time).