RestComm / media-core

RMS - Restcomm Media Server for Real Time Cloud Communications
http://www.restcomm.com/
GNU Affero General Public License v3.0
161 stars 124 forks source link

ERROR in JainMgcpStackProviderImpl <The TransactionHandler not found for TransactionHandle> #301

Open maria-farooq opened 7 years ago

maria-farooq commented 7 years ago

@hrosa @ghjansen ,

Please Letme know what more we want on this issue to eliminate this error from appearing

Error in RC - JainMgcpStackProviderImpl

02:04:04,983 ERROR [org.mobicents.protocols.mgcp.stack.JainMgcpStackProviderImpl] (Thread-555) The TransactionHandler not found for TransactionHandle 398 May be the Tx timed out. Event = 400 0The transaction could not be executed due to a transient error.

At the same time against same tx=398 we can see following happening in RMS logs

RMS logs of same TX

2016-11-17 02:02:32,319 INFO  [Server] (scheduler-worker-10) Global hearbeat is still alive
2016-11-17 02:03:32,319 INFO  [Server] (scheduler-worker-13) Global hearbeat is still alive
2016-11-17 02:03:59,940 DEBUG [ResourcesPool] (scheduler-worker-6) Released local connection,pool size:51,free:49
2016-11-17 02:03:59,940 DEBUG [ResourcesPool] (scheduler-worker-1) Released local connection,pool size:51,free:49
2016-11-17 02:03:59,940 DEBUG [ResourcesPool] (scheduler-worker-6) Released player,pool size:14,free:13
2016-11-17 02:03:59,940 DEBUG [ResourcesPool] (scheduler-worker-6) Released dtmf detector,pool size:17,free:16
2016-11-17 02:03:59,979 DEBUG [RtcpHandler] (scheduler-worker-6)
SENDING RECEIVER REPORT:
version=2, padding=false, packet type=201, length=1, ssrc=2501137600
SDES:
version= 2, padding= false, source count= 1, packet type= 202, length= 6,
SDES CHUNK:
ssrc= 2501137600, item count= 1
SDES ITEM:
type= CNAME, value= u2Lx1tnejkN3U/0M, length= 16
BYE:
version= 2, padding= false, source count=1, packet type=203, length=1, ssrc= 2501137600, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0, ssrc= 0

2016-11-17 02:03:59,980 DEBUG [MediaChannel] (scheduler-worker-6) audio channel 2501137600 is closed
2016-11-17 02:03:59,980 DEBUG [ResourcesPool] (scheduler-worker-6) Released rtp connection,pool size:15,free:14
2016-11-17 02:03:59,980 INFO  [MgcpTransaction] (scheduler-worker-6) tx=398 Started, message= DLCX mobicents/bridge/12@10.0.0.5:2427, call agent = ip-10-0-0-5.ec2.internal/10.0.0.5:2727
2016-11-17 02:03:59,981 INFO  [MgcpTransaction] (scheduler-worker-15) tx=398 was executed normaly
2016-11-17 02:04:32,319 INFO  [Server] (scheduler-worker-7) Global hearbeat is still alive
2016-11-17 02:05:32,319 INFO  [Server] (scheduler-worker-6) Global hearbeat is still alive
ghjansen commented 7 years ago

Hi @maria-farooq !

Evaluating this issue i reached org.mobicents.media.server.mgcp.tx.cmd.DeleteConnectionCmd.Delete.perform() and remembered that some time ago we discussed something around this same area. At that time, IIRC, we investigated the callID object, that is provided as a parameter for the execution of the delete command. The method perform() then extracts the value from callID object through a conversion from hexadecimal to integer, as you can see here: https://github.com/RestComm/mediaserver/blob/master/controls/mgcp/src/main/java/org/mobicents/media/server/mgcp/tx/cmd/DeleteConnectionCmd.java#L166 and once the call is found using the resultant integer, the execution of the command proceeds deleting the connection.

So, i was wondering if you managed to conclude the investigation at that time, because AFAIR the delete connection command was interrupted since no call was found for that particular callID. I ask this because is possible that, if RC sends the delete command using an invalid callID - or maybe a callID in a non-hexadecimal format, RMS won't be able to perform such action at the time, only 4 hours later, which is the hard coded timeout for call durations (see https://github.com/RestComm/mediaserver/blob/master/spi/src/main/java/org/mobicents/media/server/spi/ConnectionState.java#L38 and https://github.com/RestComm/mediaserver/issues/114).

I'm thinking out loud here. This may be a hypothesis that fits with your details. So, let me know your impressions about that. :) Thanks!

maria-farooq commented 7 years ago

Hi @ghjansen , taking forward your hypothesis, in case callID is invalid, call will be null and in that case we should be getting MgcpCommandException in RMS. https://github.com/RestComm/mediaserver/blob/master/controls/mgcp/src/main/java/org/mobicents/media/server/mgcp/tx/cmd/DeleteConnectionCmd.java#L168 But we see no such exception.

hrosa commented 7 years ago

Hi @maria-farooq

This happens when, for some reason, a connection is created but never deleted on Media Server. After 4h the Media Server will kill the connection.

To do so, Media Server is sending a DLCX to itself (which makes no sense, but ignore that for now). Upon receiving this DLCX from itself, RMS deleted the connection and sends a 200 OK to call agent (RestComm). Since it was not RestComm who sent the DLCX request, then it's normal that it has no registry of the MGCP transaction. This results in the error trace you see in the logs.

Let me be clear in one thing: this should not have any bad effect in RestComm. If MGCP client does not find the transaction, then it executes no action besides raising a warning.

And yes, RMS behaviour is wrong. It makes little sense for MGCP to send a request to itself. The correct way to deal with this specific use case would be something like:

  1. RC requests RMS to create connection
  2. RC registers event listening for that connection
  3. RMS sends NTFY warning that connectio id=xyz timed out
  4. RC sends DLCX connection id=xyz
  5. RMS sends 200 OK
  6. RC hangs up call

Since this issue has no impact on RestComm besides polluting the logs, I would like to ask you to live with this for a little longer until the new MGCP stack comes out.