RestComm / ussdgateway

RestComm USSD Gateway
http://www.restcomm.com/
GNU Affero General Public License v3.0
88 stars 102 forks source link

Dialog Timeout configuration is not working #69

Closed FerUy closed 6 years ago

FerUy commented 7 years ago

After experiencing a real life situation with a customer, where USSD Gw Dialog Timeout is set to 15000 (15 secs), but not seeing it effective with tests and traces, a simulation was conducted with identical results. USSD Gw Dialog Timeout was set to 15 secs in the GUI, which is reflected in the correspondent UssdManagement_ussdproperties.xml configuration file (attached).

Then, the following test was conducted with both Restcomm Connect and USSD Gw running (the latter in simulator mode):

This behaviour is identical as seen on live system onsite with a customer. Attached are the extracts of server logs for the commented simulation test of both USSD Gw and Restcomm-Connect, as well as the correspondent Wireshark trace.

This fix is critical, as apart than the malfunction itself (TCAP dialog configuration is useless, seems to be hardcoded), it impacts performance with high traffic USSD sessions that need to be terminated quickly.

USSD-Restcomm_dialogTimeoutTest.pcap.pcapng.zip USSDgw_server.log.zip Restcomm_server.log.zip UssdManagement_ussdproperties.xml.zip

nhanth87 commented 7 years ago

Here is it @FerUy https://github.com/RestComm/jss7/issues/66

FerUy commented 7 years ago

Thanks @nhanth87. Good to know it's a known issue and actually I doubted initially to raise it here and not in jss7. Having said that, and as this issue is affecting a live implementation of USSD Gw, I'd rather keep it here too. Moreover, the product (USSD Gateway) is delivered to customers with this in place, so it must be fixed.

nhanth87 commented 7 years ago

@FerUy your volunteer is appreciated :+1:

vetss commented 7 years ago

@FerUy

1) dialogTimeout in UssdManagement_ussdproperties.xml means the case:

2) "i.e. both TCAP and SIP dialogues terminated" - this behaviour is covered by another parameter for both PULL and PUSH cases - TcapStack_management.xml - (it is in milliseconds). This parameter can be updated by JSS7 management console - TCAP - "Dialog Idle Timeout"

3) As for what you are going to achieve - please explain - you need a user will face with a timeout / dialog termination / you need a user will NOT face. Let's discuss what you need to achieve and then we will know how to achieve it.

FerUy commented 7 years ago

Hi @vetss

As for our conversation, I here post the results and conclusions of agreed tests:

  1. Started Restcomm-Connect and USSD Gateway in simulator mode.
  2. Dialed a USSD shortcode and got the USSD RVD menu text in jSS7 simulator.
  3. Did nothing for the following 30 seconds while Wireshark was tracing.
  4. USSD Gateway ended the SIP transaction with Restcomm-Connect at precisely the time set in parameter dialogidletimeout, i.e. at 30 seconds.
  5. USSD Gateway did nothing with the TCAP established dialog.
  6. Sent an answer from the jSS7 simulator after those 30 seconds. The answer reaches the USSD Gateway which then sends a TCAP Abort back to the network.

Conclusion:

Please find a Wireshark trace of the aforementioned test attached here. USSD-Restcomm_TCdialogTimeoutTest.pcap.pcapng.zip

vetss commented 7 years ago

As for now we have two timers at USSD GW:

1) USSD dialogtimeout timer - it triggers only for PULL case when no response from HTTP / SIP server for long time 2) TCAP dialogtimeout timer - it triggers when we have SS7 TCAP dialog timeout. It terminates SIP / HTTP dialogs (with proper peer announsments) and terminates SS7 dialog (WITHOUT peer announsements). This timer covers needed events but the problem that it does not announse a SS7 peer, this option does not feet needs.

We can introduce USSD GW level new timeout timer(s) that will have functionality like option 1) but cover all timeout cases: a) httpSipDialogTimeout - timeout when waiting a response from HTTP / SIP application b) ss7DialogTimeout - timeout when waiting a response from SS7 subscriber The first one will replace and overlap timer 1) by the functionality Timer a) are will be activated just after a message to SS7 peer has sent (and cancelled after a message to SS7 peer has received) Timer b) are will be activated just after a message to SS7 peer has received (and cancelled after a message to SS7 peer has sent) These timers will terminate both TCAP and HTTP/SIP dialogs and sends announsements to a peer. Timer b) must be bu default < Timer 2) so this means that in a common use normally timers a) / b) must triggered before timer 2)

PS: for timer a) we can try to reuse of a TCAP dialog timeout timer because it has a possiblity of canceling on timeout process by:

onDialogTimeout() {
  dialog.keepOnline();
  dialog.addUSSDMessage("... explaining a reason of timeout ...")
  dialod.end()
}
FerUy commented 7 years ago

Hi @vetss , I believe you meant this for timer b)

Timer b) (ss7DialogTimeout) will be activated just after a message to SS7 peer has been sent, and cancelled just after a message from SS7 peer has been received (then httpSipDialogTimeout timer is activated).

In other words, if ss7DialogTimeout is 30000 ms, after sending a TCAP message (MAP unstructuredSSRequest or processUnstructuredSSRequest), the USSD Gateway will wait for 30 seconds. If no answer is received back from SS7 peer, USSD Gateway will terminate the corresponding TCAP dialog.

If this is the case and/or I understood it right, I agree with your entire proposal.

vetss commented 7 years ago

@FerUy yes it will terminate TCAP dialog (and also SIP / HTTP parts)

FerUy commented 7 years ago

Great... then we agree 100% @vetss :)

abhayani commented 7 years ago

Sergey,

Instead of introducing new timer shall we have a new flag/parameter in USSD Gw that if set to true, USSD Gw will send TCAP ABORT to peer when TCAP Dialog timesout?

By introducing more timers we are increasing the unnecessary complexities.

vetss commented 7 years ago

Hello @abhayani

The functionality USSD Gw will send TCAP ABORT to peer when TCAP Dialog timesout demands of a modification of TCAP stack (say change of current stack behaviour). At the time when an event come to USSD GW the TCAP dialog is already dead an no possibiliy to send any message to a SS7 peer.

This demands of introducing of another behaviour of TCAP dialog timeout when stack sends TC-ABORT to a peer is a dialog timed out. Now we just kill a dialog without announsing to a peer. I checked TCAP spec and have not found clear recommendations - should we send such TC-ABORT to a peer or not. The spec describes only INVOKE timer...

abhayani commented 7 years ago

Hi Sergey,

Thanks for details. Yes I think specs is not clear about this. But the testing done at one of our LATAM customer shows that on peer sid eDialog still remains open. I agree this requires TCAP stack level changes and by default we can have this flag false.

IMHO it makes sense to clean resources on peer side too if possible.

vetss commented 7 years ago

Amit,

Also in TCAP level we can send only TC-ABORT. From USSD GW level we can send TC-END with a USSD message to a subscriber with description why the USSD session is terminated.

FerUy commented 7 years ago

@vetss @abhayani @nhanth87 , as told in Slack, TC END in the USSD situation we are talking about makes no sense to me... imagine a user taking too much time to answer to a menu in his handset display (for whatever reason), then we send a TC END with a USSD message like "Application timeout" or whatever, well, he will never get that message and worse, we are violating the USSD session rules -we will receive a TC P-ABORT from the network immediately, being the user oblivious to all of this-, once he does something the dialog is destroyed and nothing else than an MMI message will appear (which is the same as if we send a TC U-ABORT)... in other words, with a TC END we are introducing more signalling and no value added to the user

As for ITU-T Q.773 (TCAP):

Abort::= SEQUENCE { dtid DestTransactionID, reason CHOICE { p-abortCause P-AbortCause u-abortCause DialoguePortion } OPTIONAL }

as for 3GPP TS 29.002 (MAP):

Table 7.3/6: Service-primitives for the MAP-U-ABORT service Parameters Request Indication User reason M M(=) Diagnostic information U C(=) Specific information U C(=)

User reason: This parameter can take the following values:

Diagnostic information: This parameter may be used to give additional information for some of the values of the user-reason parameter:

Table 7.3/7: User reason and diagnostic information User reason Diagnostic information

In conclusion, I'd rather go for a TCAP-U-ABORT with User reason application procedure cancellation

abhayani commented 7 years ago

then we send a TC END with a USSD message like "Application timeout" or whatever, well, he will never get that message

@FerUy Why will he never receive this message? I have tested in live network sending couple of USSD messages back-to-back and they do appear on Phone. The 1st one hides 2nd one. As soon as 1st one is removed (by user action pressing OK or CANCEL), the second appears.

"TCAP-U-ABORT with User reason application procedure cancellation" does makes sense technically. But it will not make any sense to end-user.

FerUy commented 7 years ago

@abhayani because most of the time the user will react after the TCAP dialog has been eliminated from the network due to timeout, especially for services like balance inquire, where easily the user takes more than 30 seconds to read/understand the information sent. So, I strongly believe that statistically we will end up sending rubbish to the network.

IMO, if we want to send a notification to the user about a transaction (and I also have seen it/experienced it, especially with mobile financial services) I will always send an SMS. Of course, that's not feasible today with our USSD Gw, but it would be soon (e.g. through SMPP), and it's only my opinion on the subject.

FerUy commented 7 years ago

Hi @vetss et. al.

As told in other channels of communication, attaching here Wireshark trace with a test having setup the following parameters to 30 seconds at TcapStack_management.xml configuration file:

<dialogidletimeout value="30000"/> <invoketimeout value="30000"/>

As can be noticed in the trace, TCAP abort is sent to the SS7 network simultaneously (actually before) with the SIP BYE to Restcomm-Connect after exactly 30 seconds, which is what we were looking for, and with the proper type of Abort and user reason as discussed previously in this thread.

This doesn't solve the original reason this issue was brought up, but it does with the aforementioned workaround by setting the TcapStack_management.xml configuration file to the appropriate values as mentioned earlier... which obviously is a giant leap ahead :+1:

Great job @vetss !

USSD-Restcomm_dialogTimeoutTest_patch541forIssue69_params30sec.pcap.pcapng.zip

vetss commented 7 years ago

Hello,

it looks like a first fix for TCAP dialog onlyhttps://github.com/RestComm/ussdgateway/commit/7aa38c12699d931b89141d23b1fe3aa9c42b64c1 works as expected.

Now we have 1) PULL/PUSH - TCAP dialog timeout based - timeout when a mobile subscriber does not respond for much time (default is 30 seconds) - configurable via http://localhost:8080/jss7-management-console/# - TCAP (that was added by the last fix) 2) PULL - timeout of waiting of a response from HTTP application http://localhost:8080/ussd-management - http://localhost:8080/ussd-management/# - Server Settings - Dialog timeout error message

What we still need: a) PUSH - timeout of waiting of a response from HTTP application b) PULL - timeout of waiting of a response from SIP application c) PUSH - timeout of waiting of a response from SIP application For cases a) - c) we can reuse the timer that is configured for "2)" for simplicity and use code templates from "2)". We need to establish timers at USSD GW level for it.

d) update manual for clear explanation for what timer is responsible for what

vetss commented 7 years ago

Fixed by:

https://github.com/RestComm/ussdgateway/commit/7aa38c12699d931b89141d23b1fe3aa9c42b64c1 https://github.com/RestComm/ussdgateway/commit/ffc0479ee8a53c0f1a7710b7bf33c19128e9ae31 https://github.com/RestComm/ussdgateway/commit/d72ce051092e2c23787238d532fa28ae491dfdee

FerUy commented 7 years ago

Hi @vetss

As told via chat, I tested the patch by a service between USSD Gw and Restcomm-Connect (RVD). The RVD project is very simple, you can deduce it from the following diagram:

image

The test consisted in going to module opt2 by sending "2" after the welcome module menu is presented to the user. Opt2 module has a dummy external service which answers after 20 seconds (via sleep). Having set dialogtimeout to 15 seconds (while dialogidletimeout was set to 30 seconds) the USSD Gw sends the correct timeout message inside a MAP ProcessUnstructuredSSRequest operation within a returnResultLast component of TCAP/End message at precisely 15 seconds = dialogtimeout. See attached trace. So far so good then.

Only thing that disturbs me at this point is the fact that no further SIP communication is exchanged between USSD Gw and Restcomm after that, like it happens if dialogidletimeout is reached (for example when the user doesn't respond within that period), so after the TCAP U-Abort is sent to SS7 network, a SIP BYE is sent to Restcomm-Connect and therefore both TCAP and SIP dialogues are finished.

Shouldn't then a SIP BYE be sent to Restcomm-Connect when dialogtimeout threshold is surpassed like in the test carried out?

restcommUSSDgw7.1.61_dialogtimeout15_vs_dialogidletimeout30_test.pcap.pcapng.zip

vetss commented 7 years ago

Hello @FerUy

thanks for your testing that allowed me to prepare a furter patch. I added sending of SIP BYE for both PULL and PUSH and fixed some little bugs. I will prepare new binanaries. @FerUy please retest them. Remember that we have 4 cases generally PULL / PUSH and SS7 side timeout / RC (SIP) side timeout. Better to test all cases. I tested it for HTTP case.

I have one doubtes for a following case: PUSH case when USSD GW has sent an initial TC-BEGIN to SS7 network (a first PUSH message has sent to a mobile subscriber) but then we have TCAP dialog timeout. TCAP dialog is that time in "Initiation Sent" state and if a TC User (say USSD GW) wants to terminate a TCAP dialog (because of timeout) then SS7 stack sends no TC-USER-ABORT to a peer.

This is because of TCAP spec that sais : "When the transaction is in the "Initiation Sent" state, i.e. a Begin message has been sent but no backward message for this transaction has been received, the result of the TR-U-ABORT request primitive is purely local."

It is not a big update of SS7 stack to send TC-USER-ABORT to a peer in "Initiation Sent" state, not clear what behavier is correct. @FerUy do not you have any experiense for this case ?

FerUy commented 7 years ago

Hi @vetss ... thanks, I will test it and revert asap.

Regarding your last question, I have no experience in such scenario, surely because sending a TC-U-ABORT during the Initiation Sent state makes no sense for PUSH USSD. When the USSD Gw sends a TC-Begin it's because the application that triggered it is only expecting one thing: the answer from the USSD user. Otherwise, the application logic is incorrectly designed. Hence, only dialogidletimeout is important during the Initiation Sent state, while dialogtimeout will never be triggered (or its threshold reached/passed) for PUSH USSD. Agreed?

vetss commented 7 years ago

@FerUy

I was describing the case PUSH - Initiation Sent - TCAP dialogidletimeout In other words USSD GW was sent a first PUSH message (inside a dialog) and no response from a subscriber for a configurable time. In this case we do not send TC-USER-ABORT to a SS7 peer (because of TCAP stack implementation because of a spec) and just terminate TCAP and HTTP / SIP dialogs. And for this case I have doubts.

FerUy commented 7 years ago

@vetss sorry for the confusion and thanks for the clarification.

Let's stick to the spec, it will eventually send its timeout Abort when it's due. From our side, we are good just terminating TCAP and HTTP/SIP dialogues in USSD Gw.

FerUy commented 7 years ago

Hi @vetss

Just attaching here the trace of the test commented on Slack...

restcommUSSDgw7.1.62_dialogtimeout15_vs_dialogidletimeout30_test.pcap.pcapng.zip

FerUy commented 7 years ago

Hi @vetss, please see my last two comments in https://github.com/RestComm/Restcomm-Connect/issues/2411 Attaching last test logs and trace here as well, as requested by @deruelle issuesRC2411-USSSD69.zip

FerUy commented 6 years ago

Apart from the race condition commented, which only happens if some especial configuration is provided on RC side, this issue is solved. Will create another one for that, just for perfection sake ;)