RestComm / Restcomm-Connect

The Open Source Cloud Communications Platform
http://www.restcomm.com/
GNU Affero General Public License v3.0
244 stars 215 forks source link

SIP BYE not answered when sent from USSD Gw after dialogtimeout occurs #2411

Closed FerUy closed 7 years ago

FerUy commented 7 years ago

A patch for solving https://github.com/RestComm/ussdgateway/issues/69, in order to ending a USSD session which has not been responded by SIP or HTTP to the USSD Gateway accordingly with the dialogtimeout value, was tested along with Restcomm-Connect. Same USSD RVD project as the one provided in issue #2410 was used. Likewise, same php script, on this occasion with sleep set to 20 seconds.

rvd.xml externalServiceTimeout parameter was left to default 5000 value (5 seconds), while dialogtimeout in USSD Gw was set to 3000 (3 seconds). Consequently, the expected result was USSD Gw timing out at precisely 3 seconds and therefore sending both MAP End message to SS7 core network and SIP BYE to Restcomm-Connect in the meantime the latter was still processing the external service (which was set to answer in 20 seconds, thus Restcomm-Connect would timeout 15 seconds before that, but 2 seconds after USSD Gw's timeout).

USSD Gw acts as expected, but SIP BYE inbound at Restcomm-Connect doesn't prevent it to keep processing the USSD session and it never answers the multiple SIP BYE messages sent from the USSD Gw after its dialogtimeout is reached (3 seconds). Evidence shown in Wireshark trace attached, Restcomm-Connect server.log and RVD RCML log below, which shows the On Timeout occurring at 5 seconds, 2 seconds after SIP BYE with correct SIP headers was sent and received from the USSD Gw to end that SIP dialog).

[Thu Aug 03 00:14:15 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc] "Running target: start"
[Thu Aug 03 00:14:15 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc][RCML] <Response>
  <UssdCollect action="controller?target=start.step2.handle">
    <UssdMessage>Enter 1, 2 or 3</UssdMessage>
  </UssdCollect>
</Response>
[Thu Aug 03 00:14:19 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc] "Running target: start.step2.handle"
[Thu Aug 03 00:14:19 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc] "Running target: module2"
[Thu Aug 03 00:14:19 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc][ES][REQUEST] "Requesting from url: http://127.0.0.1/sleep.php"
[Thu Aug 03 00:14:24 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc] "Running target: module4"
[Thu Aug 03 00:14:24 UYT 2017][app APaf6019a57db74ec1bb4bc11cb0fe69bc][RCML] <Response>
  <UssdMessage>Timeout while processing External Service.</UssdMessage>
</Response>

Other tests "playing" with all involved timers involved in the test were carried out with same outcome.

server.log.zip restcommUSSDgw7.1.62_dialogtimeout3_vs_externalServiceTimeout5.pcap.pcapng.zip

FerUy commented 7 years ago

Downloaded latest Restcomm-Connect binaries from, i.e. Restcomm-JBoss-AS7-8.2.0.1278 (which includes a new Timeout parameter for External Services modules). As commented in #2410, it works for the tests carried out there.

For this issue's concern however (as well as https://github.com/RestComm/ussdgateway/issues/69 concerns), dialogtimeout needed to be tested with a lower value than either the sleep value put in the external service script or the ES module Timeout value.

Then, as can be seen in attached Wireshark trace using Restcomm-JBoss-AS7-8.2.0.1278, something weird occurs when USSD Gateway sends the SIP BYE after dialogtimeout value is trespassed: the ES module sends an HTTP 200 OK (?) as it would have reached its timeout (albeit it was set much higher) and therefore Restcomm-Connect sends SIP BYEto USSD Gateway too, which is responded with SIP 200 OK by the USSD Gateway.

Again, initial SIP BYE sent by USSD Gateway is not yet responded by Restcomm-Connect. Weirdly, USSD Gateway sends previous SIP BYE again to Restcomm-Connect, which is answered by it with a now logical SIP 487 Call leg/Transaction does not exist.

So, although with newer Restcomm-JBoss-AS7-8.2.0.1278 release, SIP dialog is eventually terminated, it is done in an inefficient and crazy way. Either sides acts in an odd manner.

restcommUSSDgw7.1.62_Connect8.2.0.1278_dialogTimeout5_sleep20_rvdESTimeout10.pcap.pcapng.zip

FerUy commented 7 years ago

As requested by @deruelle, recreated the same scenario commented in previous comment here, but now attaching it with server logs, besides the trace, for complete understanding of the remaining problem for @vetss @gvagenas @abdulazizali77 @otsakir et. al. issuesRC2411-USSSD69.zip

FerUy commented 7 years ago

Hi @abdulazizali77 I tested your patch with provided restcomm-connect.ussd-8.2.0-SNAPSHOT.jar and the results didn't change anything from previous tests.

First I tested the same one as the previous one here, i.e. RVD ES Timeout = 10 secs USSD Gateway dialogtimeout = 5 secs rvd.xml externalServiceTimeout = not set External Service sleep time = 20 secs

The trace shows exactly the same behaviour: as can be seen in attached Wireshark trace, when USSD Gateway sends the SIP BYE after dialogtimeout value is trespassed (5 seconds) which is not responded by RC. The ES module sends an HTTP 200 OK at 5 seconds as it would have reached its default externalServiceTimeout (it was set much higher in ES) and therefore Restcomm-Connect sends SIP BYEto USSD Gateway too, which is responded with SIP 200 OK by the USSD Gateway. Again, initial SIP BYE sent by USSD Gateway is not yet responded by Restcomm-Connect. Weirdly, USSD Gateway sends previous SIP BYE again to Restcomm-Connect, which is answered by it with a now logical SIP 487 Call leg/Transaction does not exist.So, although this would eventually work in the sense of terminating in a not proper way the SIP dialog (and not according to the configured ES timeout -this is more issue #2410 related-), nothing changed from previous call flow.

pr2411_ESTimeout10s_externalServiceTimeoutNOTSET_USSDdialogTimeou5s_sleep20.pcap.pcapng.zip

Then, I tested this scenario: RVD ES Timeout = 10 secs USSD Gateway dialogtimeout = 25 secs rvd.xml externalServiceTimeout = 6 secs External Service sleep time = 20 secs This is when everything goes terribly wrong. Although RC times out at 5 seconds, no SIP BYE is sent back to USSD Gateway. USSD Gateway times out according to its configured dialog timeout (25 seconds later it sent the last SIP INFO), so it starts sending SIP BYE, which is never answered by RC. As you can see in the trace, USSD Gw keeps sending SIP BYE until it considers RC is "dead" and stops. At USSD Gw side, either MAP and SIP dialogs are terminated, but on RC, SIP dialog is kept forever. This is the worst case scenario, and it's still working wrong.

pr2411_ESTimeout10s_externalServiceTimeout6s_USSDdialogTimeou25s_sleep20.pcap.pcapng.zip server.log.zip

FerUy commented 7 years ago

Look at server.log error next for the problem described above:

15:22:11,157 ERROR [org.restcomm.connect.ussd.interpreter.UssdInterpreter] (RestComm-akka.actor.default-dispatcher-5) No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state.: org.restcomm.connect.commons.fsm.TransitionNotFoundException: No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state.
    at org.restcomm.connect.commons.fsm.FiniteStateMachine.transition(FiniteStateMachine.java:60) [restcomm-connect.commons-8.2.0.1278.jar:8.2.0.1278]
    at org.restcomm.connect.ussd.interpreter.UssdInterpreter.onReceive(UssdInterpreter.java:482) [restcomm-connect.ussd-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
    at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:159) [akka-actor_2.10-2.1.2.jar:]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:425) [akka-actor_2.10-2.1.2.jar:]
    at akka.actor.ActorCell.invoke(ActorCell.scala:386) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:230) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.Mailbox.run(Mailbox.scala:212) [akka-actor_2.10-2.1.2.jar:]
    at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:506) [akka-actor_2.10-2.1.2.jar:]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478) [scala-library-2.10.1.jar:]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) [scala-library-2.10.1.jar:]

15:22:11,163 ERROR [org.restcomm.connect.commons.faulttolerance.RestcommSupervisorStrategy] (RestComm-akka.actor.default-dispatcher-12) RestcommSupervisorStrategy, actor exception handling. Actor path akka://RestComm/user/$g/$a, exception cause org.restcomm.connect.commons.fsm.TransitionNotFoundException: No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state., default exception handling strategy Resume
FerUy commented 7 years ago

Hi @abdulazizali77, as commented in Slack, now I see it working as expected. I proceeded with the exact same tests as described above and everything looks good to me. Attached next are traces and RC server logs. patch2_issue2411.zip

Moreover, the CDRs as seen in following image taken from RC GUI look as expected, both with canceled status (first one due to external service timeout, second one due to USSD Gw timeout).

tests_cdrs

In summary, for me it's a go.

FerUy commented 7 years ago

Workaround done in PR #2438, closing.