RestComm / jain-sip

Disclaimer: This repository is a git-svn mirror of the project found at http://java.net/projects/jsip whose original repository is developed collaboratively by the Advanced Networking Technologies Division at the National Institute of Standards and Technology (NIST) - an agency of the United States Department of Commerce and by a community of individual and enterprise contributors. TeleStax, Inc. will perform some productization work, new features experimentation branches, etc for its TelScale jSIP product that doesn't concern the community from the main repository hence this git repository.
http://www.restcomm.com/
141 stars 151 forks source link

long delay creating a channel #143

Closed jaimecasero closed 7 years ago

jaimecasero commented 7 years ago

[Entry id=5529, date=2017-02-09 10:13:32,153, level=DEBUG, thread=pool-AffinityJAIN-thread-16, emitter=gov.nist.core.CommonLoggerLog4j.logDebug(CommonLoggerLog4j.java:129), message=NioTcpMessageProcessor::createMessageChannel: 84.73.89.30:49060] [Entry id=5530, date=2017-02-09 10:14:32,182, level=DEBUG, thread=pool-AffinityJAIN-thread-16, emitter=gov.nist.core.CommonLoggerLog4j.logDebug(CommonLoggerLog4j.java:129), message=NioTcpMessageChannel::NioTcpMessageChannel: 84.73.89.30:49060]

seems to be 1 minute between these two subsequent traces

jaimecasero commented 7 years ago

it seems all is caused by remote node delivering conn timeout. Lets make timeout configurable

yarel79 commented 7 years ago

I had a closer look at that and sharing my thoughts.

First log message seems to be coming from here: https://github.com/RestComm/jain-sip/blob/master/src/gov/nist/javax/sip/stack/NioTcpMessageProcessor.java#L448-L451

...while the second log message seems to be coming from here: https://github.com/RestComm/jain-sip/blob/master/src/gov/nist/javax/sip/stack/NioTcpMessageChannel.java#L182-L189

In between the following happens:

1) String key = MessageChannel.getKey(targetHostPort, transport);

This constructs key on basis of hostport and transport and looks like:

public static String getKey(HostPort hostPort, String transport) {
        // http://java.net/jira/browse/JSIP-413 Concurrency issue within MessageChannel.java when using IPv6 addresses
        String ipAddress = hostPort.getHost().getIpAddress();
        if (ipAddress == null) {
            ipAddress = hostPort.getHost().getHostname();
        }
        return (transport + ":" + ipAddress.replaceAll("[\\[\\]]", "") + ":" + hostPort.getPort())
                .toLowerCase();
    }

Maybe hostname to IP resolution takes place and it takes some time?

2) retval = messageChannels.get(key);

messageChannels is a concurrent hash map and it may block in the final phase of fetching element from a hash map segment. HashMap is divided into multiple segments and lock is performed per segment.

Internally, first fetch is non blocking, but if it is null then second fetch is performed this time with lock acquired to be absoluteley sure an element is not present in the map.

3)

if (retval == null) {
    retval = createMessageChannel(key, targetHostPort.getInetAddress(), targetHostPort.getPort());  
}

Step 3 is actually:

    private synchronized MessageChannel createMessageChannel(String key, InetAddress targetHost, int port)  throws IOException {
        ConnectionOrientedMessageChannel retval = messageChannels.get(key);
        //once locked, we need to check condition again
        if( retval == null ) {
                retval = constructMessageChannel(targetHost, port);
                this.messageChannels.put(key, retval);
                retval.isCached = true;
                if (logger.isLoggingEnabled(LogWriter.TRACE_DEBUG)) {
                        logger.logDebug("key " + key);
                        logger.logDebug("Creating " + retval);
                }
                selector.wakeup();
        }       
        return retval;      
    }  

Based on above, following part generates second log message:

ConnectionOrientedMessageChannel constructMessageChannel (InetAddress targetHost, int port) throws IOException {
        return new NioTcpMessageChannel(targetHost,port, sipStack, this);
 }

which leads to call of:

public NioTcpMessageChannel(InetAddress inetAddress, int port,
            SIPTransactionStack sipStack,
            NioTcpMessageProcessor nioTcpMessageProcessor) throws IOException {
        super(sipStack);
        if (logger.isLoggingEnabled(LogWriter.TRACE_DEBUG)) {
            logger.logDebug("NioTcpMessageChannel::NioTcpMessageChannel: "
                + inetAddress.getHostAddress() + ":" + port);
        }
....

From my perspective: 1) resolving hostname into IP address may takes some time and should be confirmed/excluded in the first place 2) contention on messageChannels is possible, but less likely (1 minute lock acquisition?) 3) createMessageChannel() is synchronized, therefore acquiring monitor to the object may also take some time, but up to 1 minute?

jaimecasero commented 7 years ago

@yarel79 be aware all this logic is coupled to the actual connect attempt,which is happening at BLOCKING mode unfortunately. The conn timeout is hardcoded to 10 seconds, and then a second reattempt will be performed if first fail (you have 20 seconds there). If different threads are trying to connect to same ip/port, they will queued in the lock ,and each will go throuhg the 10+10 seconds attempts (in case the target peer is not connecting).

We have already implemented the configurable conn timeout, that may help reducing this. 10 seconds is a bit to high for a good network.