RestComm / jain-sip

Disclaimer: This repository is a git-svn mirror of the project found at http://java.net/projects/jsip whose original repository is developed collaboratively by the Advanced Networking Technologies Division at the National Institute of Standards and Technology (NIST) - an agency of the United States Department of Commerce and by a community of individual and enterprise contributors. TeleStax, Inc. will perform some productization work, new features experimentation branches, etc for its TelScale jSIP product that doesn't concern the community from the main repository hence this git repository.
http://www.restcomm.com/
142 stars 151 forks source link

Race condition while instantiating new NioTcpMessageChannel causing silent Packet drops subsequently #67

Closed gkumar78 closed 8 years ago

gkumar78 commented 8 years ago

The NioTcpMessageProcessor.createMessageChannel() method has unguarded Critical section which can cause problem when multiple MSS Executor threads try to create new NioTcpMessageChannel instance at same moment.

This method internally calls NioTcpMessageChannel constructor which further creates a new SocketChannel through NIOHandler.createOrReuseSocket() method. The socket creation logic in NIOHandler.createOrReuseSocket() is threadsafe; so when multiple threads calls NioTcpMessageProcessor.createMessageChannel() at same instant, only 1 socketChannel is created which gets mapped to multiple instances of NioTcpMessageChannel.

It has been observed that this simultaneous attempt to create NioTcpMessageChannel instance by multiple threads can cause different in-compatible mappings to be added to ConnectionOrientedMessageProcessor.messageChannels and NioTcpMessageChannel.channelMap structures (which has been observed to happen 1 out of every 7-8 times with multiple parallel threads), which further on leads to packet drops destined to that target once the SocketChannel gets cleaned up on receipt of TCP FIN packet (leading to IOException on read operation). In such a scenario, on cleanup of SocketChannel, the socket is closed and removed from NioHandler/NioTcpMessageChannel structures as well but its entry in ConnectionOrientedMessageProcessor.messageChannels structure is not cleaned up due to following additional check in ConnectionOrientedMessageProcessor.remove() method

if (messageChannels.get(key) == messageChannel) this.messageChannels.remove(key);

This causes a invalid NioTcpMessageChannel instance to stay in messageChannels map with its SocketChannel instance in close state. When next connection attempt is made to same target (with same key used to lookup messageChannels), the dangling NioTcpMessageChannel instance is used for NIO operation and causes a Dead socket operation and cleanedup. Then a new socket is created for the target but it somehow fails with null return value on call to change.socket.keyFor(selector) in ConnectionOrientedMessageProcessor.ProcessorTask.run() method. This causes failure to send any further packets to that tcp target and no error reported at all.

I am using MSS 2.0.0 with tomcat 7 wherein this issue was observed, but this issue would be present in master as well as the relevant code blocks have no apparant fix added.

I am also attaching the logfile with filtered statements for these 3 main classes where this issue was observed. There are 2 MSS threads trying to create new NioTcpMessageChannel instances:

  1. Instance gov.nist.javax.sip.stack.NioTcpMessageChannel@591cac4b is created by MSS-Executor-Thread-28 and saved in ConnectionOrientedMessageProcessor.messageChannels with key tcp:10.152.151.162:55080
  2. Instance gov.nist.javax.sip.stack.NioTcpMessageChannel@69739da1 created by MSS-Executor-Thread-29 and saved in NioTcpMessageChannel.channelMap against the key of created socketChannel. Hence, this instance is used for every read & write on the socket.
gkumar78 commented 8 years ago

logs_race_condition_niotcpmessagechannel_creation.txt

Uploaded logs for classes NioTcpMessageChannel and NioTcpMessageProcessor and NIOHandler

deruelle commented 8 years ago

Thanks we will look into it as time permits. Let us know if you would like to contribute a fix/ pull request for that with non regression test, that would help a lot in taking care of it.

gkumar78 commented 8 years ago

Submitted pull request https://github.com/RestComm/jain-sip/pull/69 with simple fix.

Used Double checked locking to avoid thread blocking for normal flow i.e. when there is already a NioTcpMessageChannel present for destination tcp target

deruelle commented 8 years ago

Merged