Closed gkumar78 closed 8 years ago
logs_race_condition_niotcpmessagechannel_creation.txt
Uploaded logs for classes NioTcpMessageChannel and NioTcpMessageProcessor and NIOHandler
Thanks we will look into it as time permits. Let us know if you would like to contribute a fix/ pull request for that with non regression test, that would help a lot in taking care of it.
Submitted pull request https://github.com/RestComm/jain-sip/pull/69 with simple fix.
Used Double checked locking to avoid thread blocking for normal flow i.e. when there is already a NioTcpMessageChannel present for destination tcp target
Merged
The NioTcpMessageProcessor.createMessageChannel() method has unguarded Critical section which can cause problem when multiple MSS Executor threads try to create new NioTcpMessageChannel instance at same moment.
This method internally calls NioTcpMessageChannel constructor which further creates a new SocketChannel through NIOHandler.createOrReuseSocket() method. The socket creation logic in NIOHandler.createOrReuseSocket() is threadsafe; so when multiple threads calls NioTcpMessageProcessor.createMessageChannel() at same instant, only 1 socketChannel is created which gets mapped to multiple instances of NioTcpMessageChannel.
It has been observed that this simultaneous attempt to create NioTcpMessageChannel instance by multiple threads can cause different in-compatible mappings to be added to ConnectionOrientedMessageProcessor.messageChannels and NioTcpMessageChannel.channelMap structures (which has been observed to happen 1 out of every 7-8 times with multiple parallel threads), which further on leads to packet drops destined to that target once the SocketChannel gets cleaned up on receipt of TCP FIN packet (leading to IOException on read operation). In such a scenario, on cleanup of SocketChannel, the socket is closed and removed from NioHandler/NioTcpMessageChannel structures as well but its entry in ConnectionOrientedMessageProcessor.messageChannels structure is not cleaned up due to following additional check in ConnectionOrientedMessageProcessor.remove() method
if (messageChannels.get(key) == messageChannel) this.messageChannels.remove(key);
This causes a invalid NioTcpMessageChannel instance to stay in messageChannels map with its SocketChannel instance in close state. When next connection attempt is made to same target (with same key used to lookup messageChannels), the dangling NioTcpMessageChannel instance is used for NIO operation and causes a Dead socket operation and cleanedup. Then a new socket is created for the target but it somehow fails with null return value on call to change.socket.keyFor(selector) in ConnectionOrientedMessageProcessor.ProcessorTask.run() method. This causes failure to send any further packets to that tcp target and no error reported at all.
I am using MSS 2.0.0 with tomcat 7 wherein this issue was observed, but this issue would be present in master as well as the relevant code blocks have no apparant fix added.
I am also attaching the logfile with filtered statements for these 3 main classes where this issue was observed. There are 2 MSS threads trying to create new NioTcpMessageChannel instances: