RestComm / Restcomm-Connect

The Open Source Cloud Communications Platform
http://www.restcomm.com/
GNU Affero General Public License v3.0
240 stars 214 forks source link

TransitionNotFoundException in call actor #1613

Open maria-farooq opened 7 years ago

maria-farooq commented 7 years ago

@gvagenas we found a TransitionNotFoundException in call actor Please review.

According to brief search in logs:

Solution Suggestion:

Exception trace

19:30:39,133 ERROR [org.restcomm.connect.telephony.Call] (RestComm-akka.actor.default-dispatcher-2806) No transition could be found from a(n) stopping state to a(n) stopping state.: org.restcomm.connect.commons.fsm.TransitionNotFoundException: No transition could be found from a(n) stopping state to a(n) stopping state.
    at org.restcomm.connect.commons.fsm.FiniteStateMachine.transition(FiniteStateMachine.java:60) [restcomm-connect.commons-8.0.0.1084.jar:8.0.0.1084]
    at org.restcomm.connect.telephony.Call.onSipServletRequest(Call.java:1505) [restcomm-connect.telephony-8.0.0.1084.jar:8.0.0.1084]
    at org.restcomm.connect.telephony.Call.onReceive(Call.java:453) [restcomm-connect.telephony-8.0.0.1084.jar:8.0.0.1084]
    at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:159) [akka-actor_2.10-2.1.2.jar:]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:425) [akka-actor_2.10-2.1.2.jar:]
    at akka.actor.ActorCell.invoke(ActorCell.scala:386) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:230) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.Mailbox.run(Mailbox.scala:212) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:506) [akka-actor_2.10-2.1.2.jar:]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) [scala-library-2.10.1.jar:]

Complete Logs

maria-farooq commented 7 years ago

@deruelle @gvagenas i quickly checked the impact of this bug and corresponding CDR stays in in-progress due to this issue.

deruelle commented 7 years ago

Let's fix it then as it will become an issue when coupled to throttling.

gvagenas commented 7 years ago

@maria-farooq to me the problem here is not that we just need to check the state. VoiceInterpreter reached the end of the RCML and thus asked Call to hangup

19:30:39,113 INFO  [org.restcomm.connect.interpreter.VoiceInterpreter] (RestComm-akka.actor.default-dispatcher-2812) End tag received will move to hangup the call, VI state: finish recording
19:30:39,113 INFO  [org.restcomm.connect.telephony.Call] (RestComm-akka.actor.default-dispatcher-2812) ********** Call's akka://RestComm/user/$Hd Current State: "in progress direction: inbound
19:30:39,113 INFO  [org.restcomm.connect.telephony.Call] (RestComm-akka.actor.default-dispatcher-2812) ********** Call akka://RestComm/user/$Hd Processing Message: "org.restcomm.connect.telephony.api.Hangup sender : akka://RestComm/user/$Gd
19:30:39,113 DEBUG [org.restcomm.connect.telephony.Call] (RestComm-akka.actor.default-dispatcher-2812) Got Hangup for Call, from: sip:+16123922393@sip.nexmo.com to: sip:16508251450@sip.nexmo.com state: in progress conferencing: false conference: null

Because of the HANGUP message the Call will create and send BYE but then Nexmo sends BYE and that causes the exception.

Is it normal that Nexmo sends BYE? If yes then your suggested patch is ok, but I believe that Nexmo shouldn't send BYE at this point and you should investigate possible problem with routing

George

maria-farooq commented 7 years ago

Is it normal that Nexmo sends BYE?

@gvagenas it could be if caller hangs up, no? Letme try to reproduce it and make sure i am not missing anything, will update here

gvagenas commented 7 years ago

@maria-farooq since the Call actor sends BYE the Nexmo should response with 200 OK to BYE. But we can see that Nexmo sends BYE again. Something is wrong here and the problem is not the transition. Agree, please try to reproduce and check the call flow

deruelle commented 7 years ago

@gvagenas there can be a race condition where Restcomm sends BYE and the caller sends BYE at the same time. We should be able to handle that at Restcomm level without any errors

gvagenas commented 7 years ago

@deruelle If that's the case then the solution from @maria-farooq is the proper one but I think since the call was in progress and there was a recording (rtp traffic) there was no reason for Nexmo to send BYE thus the best is to verify that there was no other problem

deruelle commented 7 years ago

It may be the subscriber sending the BYE which is only forwarded by Nexmo or silence on the rtp in which case Nexmo would cut the call. I'm just saying we need to be ready to process incoming BYE at the same time we generate a BYE ourselves. It's a race condition of the network that can happen.

gvagenas commented 7 years ago

@deruelle agree with your point and we should protect Call actor for that similar to the race condition of Cancel ( #949 ).

My point here is that we should first investigate the call flow for bugs before we patch it because providing the workaround will hide the problem if any.

ocarriles commented 7 years ago

@deruelle Faced same issue in 2 from 3 calls. My RestComm-Connect machine had its primary DNS mispointed and resolved after timeout via the second one. Fixing DNS issue solves the problem. (timeouts involved?), hope it helps.