Azure / azure-iot-protocol-gateway

Azure IoT protocol gateway enables protocol translation for Azure IoT Hub
Other
225 stars 151 forks source link

Protocol gateway throwing Socket errors and TLS handshake failures subsequently even before a Device Connect. #61

Closed whizlingo closed 8 years ago

whizlingo commented 8 years ago

While running the Protocol gateway in the simultar on TLS , we have noticed that as soon as the channel is active, the ExceptionCaught method gets called having exception set to {"An existing connection was forcibly closed by the remote host"} with exception trace stack trace pointing at below :

at DotNetty.Transport.Channels.Sockets.SocketChannelAsyncOperation.Validate() at DotNetty.Transport.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)

The ExceptionCaught implementation then calls the ShutDownOnError in Protocol gateway which in turns calls the self shutdown (The current state flag value is "Waiting for connect"). This continues to happen in cycle and keeps logging tons of errors in the logs along with TLS handshake warning due to the "UserEventTriggered" getting triggered with failed status.

This all happens even before device could connect. In a normal device connect session ,this does not happen in this fashion as it quickly moves the state to processing connect and then connected.

We are currently running a version of Protocol gateway dated (2nd of June) which does not have the latest bits also pointing to its respective dotnetty.

whizlingo commented 8 years ago

Is this some thing related to the fixes checked in following dotnetty commit https://github.com/Azure/DotNetty/commit/b2f10c28cb334318a3165976baa3ef2864c02503

nayato commented 8 years ago

@whizlingo, fixes in DotNetty are for extraneous errors that follow actual channel closure - that's only about cleaning up noise. If connection gets closed by the remote party as described in the error, you should check the remote party - simulator you've mentioned - for reasons to close the connection. If it's not possible at least checking the exchange on the wire with wireshark might shed some light on this.

whizlingo commented 8 years ago

@nayato , We are getting the calls to Exception Caught as soon as we run the service basically post the Channel active event is fired. There is no device trying to connect using device simulator.

nayato commented 8 years ago

Are you saying it is TcpServerSocketChannel that throws and not TcpSocketChannel?

whizlingo commented 8 years ago

Yeah It seems when the stack trace of the exception is

at DotNetty.Transport.Channels.Sockets.SocketChannelAsyncOperation.Validate() at DotNetty.Transport.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)

Socket Error code is "Connection Reset" if that helps.

whizlingo commented 8 years ago

I can still run the wire-shark to see anything but this is my vm hosting it in azure simulator. This only occurs though when hosting the 8883. if i remove the TLShandler , it runs fine. So i guess something getting raised from. So far i have not added dotnetty code to enable debugging.

nayato commented 8 years ago

stack does not tell anything here. Question is basically what is the type of channel where ExceptionCaught occurs. From error it seems it must be TcpSocketChannel but you're saying no one's trying to connect even at this point. If it is TcpSocketChannel, please find out what happens on the other side of connection because it must be initiated from outside and again, according to error, the other party closed the connection hence the error. If it is the other party then it should have more data as to what has triggered it to close the connection.

whizlingo commented 8 years ago

ok i see that it indeed is a TCP Socket channel and i see the remote ip as well in the context, will run wireshark now to debug it further. Thanks , will update shortly

whizlingo commented 8 years ago

The issue was with unknown TCP re-transmit packets arriving on MQTT ports on my Azure VM. I figured out that it was actually from Azure monitoring.

robichaud commented 8 years ago

@whizlingo Were you able to find a solution to stop errors/re-transmits?