Closed FerUy closed 7 years ago
@FerUy , thanks for the elaborate analysis. That's a really strange behavior. RVD timeout internals should be transparent for Restcomm/USSD GW once proper RCML is returned.
Please check/confirm the following scenario:
My point is:
Does the issue with "USSD Gateway never receives neither SIP BYE nor SIP INFO" have to do with the fact that an RVD timeout is reached OR the fact that a 2 second timeout is too big for restcomm/USSD GW to handle and something bad happens.
Trying to explain my point further:
In your working scenario, everything is fine when the
sleep()
waits for 1 second. What if something breaks for requests that get responses after 2 sec regardless of whether the response originated from an ES-timeout case or not.
If this point is wrong, and everything works for requests-responses with 2 sec duration, could you also include http/RCML in the logs ? As far as i udnerstand you copied excerpts from the application log. Let's try to make sure that everything in terms of RCML is also ok on the wire.
You are welcome, and thanks you back @otsakir !
Today I did several tests of all kinds (with latest TelScale release, 2.8.0.1560, the same one being used in customer's implementation), and latest Restcomm USSD Gateway with the patch provided by @vetss to solve https://github.com/RestComm/ussdgateway/issues/69 (when I refer to dialogtimeout
, I'm talking about USSD Gateway's timeout with SIP for this case).
Of the many tests I did, now taking traces including HTTP filter as you requested wisely, I'm attaching traces and logs here of four of them, and it's clear we have issues.
USSD Gateway dialogtimeout
= 10 secs
RVD externalServiceTimeout
= 4 secs
External Service sleep
time = 2 secs
Here, the expected result was receiving through USSD this final message in jSS7 simulator: Timeout while processing External Service.
. Strangely, USSD Gw didn't receive it through SIP (in previous tests it was working), then USSD Gw timeouts after 10 seconds and sends corresponding timeout message in SS7 MAP End to jSS7 simulator (received) and SIP BYE to Restcomm-Connect (which is never responded).
Test 1 logs and trace here: iss2410test1_sl4-USSD10-rvdES2.zip
USSD Gateway dialogtimeout
= 15 secs
RVD externalServiceTimeout
= 4 secs
External Service sleep
time = 3 secs
On this occasion, expected result was receiving through USSD this final message in jSS7 simulator: Sleep time: 3 seconds.
. Accordingly to what was perceived when issue was raised, nothing is received via SIP in USSD Gateway despite communication is received from the external service 1 second before the configured externalServiceTimeout
(notice that as in all tests, HTTP message from the external service is sent back as expected at the same time as recorded in rvd log), thus USSD Gateway timeouts at 15 seconds and sends corresponding timeout message in SS7 MAP End to jSS7 simulator (received) and SIP BYE to Restcomm-Connect (which is never responded).
Test 2 logs and trace here: iss2410tet2_sl3-USSD15-rvdES4.zip
USSD Gateway dialogtimeout
= 29 secs
RVD externalServiceTimeout
= 30 secs
External Service sleep
time = 10 secs
On this occasion, expected result was receiving through USSD this final message in jSS7 simulator: Sleep time: 10 seconds.
. Accordingly to what was when issue was raised, nothing is received via SIP in USSD Gateway despite communication is received from the external service 20 seconds before the configured externalServiceTimeout
, thus USSD Gateway timeouts at 29 seconds and sends corresponding timeout message in SS7 MAP End to jSS7 simulator (received) and SIP BYE to Restcomm-Connect (which is never responded).
Test 3 logs and trace here: iss2410_sl10-USSD29-rvdES30.zip
USSD Gateway dialogtimeout
= 25 secs
RVD externalServiceTimeout
= 25 secs
External Service sleep
time = 20 secs
On this occasion, expected result was receiving through USSD this final message in jSS7 simulator: Sleep time: 20 seconds.
. Accordingly to what was when issue was raised, nothing is received via SIP in USSD Gateway despite communication is received from the external service 5 seconds before the configured externalServiceTimeout
, thus USSD Gateway timeouts at 25 seconds and sends corresponding timeout message in SS7 MAP End to jSS7 simulator (received) and SIP BYE to Restcomm-Connect (which is never responded).
Test 4 logs and trace here: iss2410_sl20-USSD25-rvdES25.zip
Hope these tests help to clarify the situation involving RVD externalServiceTimeout
parameter.
I will carry out further tests with the new 'timeout' property in the specific RVD/ES element you mentioned and publish results here too (albeit this is another story to tell).
Hi @otsakir, as told in previous comment, I went through testing of new 'timeout' property in the specific RVD/ES element. For that, I downloaded latest Restcomm-Connect binaries, i.e. Restcomm-JBoss-AS7-8.2.0.1278. Good news!!
Please find attached some of the tests carried out. All of them derived in a SIP BYE sent to the USSD Gw and session is terminated as expected.
Traces of 3 of the tests conducted can be found in the traces attached. These tests had the USSD Gateway timeout set higher than either the sleep time or the RVD ES timeout. All passed.
issue2410_testsRC_8.2.0.1278.zip
Not closing the issue as still we have the problem of externalServiceTimeout
not working as expected in rvd.xml (leaving the decision to you), but definitely these tests over the new feature are great news.
Hi @otsakir
Thinking about why not closing this issue, I did another test: setting the externalServiceTimeout
to a value and then leaving it unset. The outcome is different and if you actually set the value of externalServiceTimeout
in rvd.xml, the outcome is the original undesired one (USSD Gw eventually timeouts and SIP BYE is never responded). In this test, although as seen in the rvd log, is that the timeout set apparently prevails over the one set for externalServiceTimeout
in rvd.xml, in reality nothing happens (no timeout is sent to trigger Restcomm to finish the SIP session).
On the contrary, if you leave the externalServiceTimeout
parameter unset in rvd.xml, with all the rest set equally, the result is OK and Restcomm works like is expected, thus ending the session after the ES module time outs.
Please have the attached traces and rvd logs from either described tests attached here: issue2410_tests_20170705.zip
Hi @otsakir @croufay
I carried out more tests and it was worthy, as previous comment could be a little confusing. Let me describe the tests separately.
RVD ES Timeout
= 3 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= not set (default)
External Service sleep
time = 20 secs
Everything works as expected. At precisely 3 seconds after the ES timeout was triggered, it timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
Traces and RVD log: ESTimeout3s_externalServiceTimeoutUNSET_USSD25s_sleep20.zip
RVD ES Timeout
= 6 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= not set (default)
External Service sleep
time = 20 secs
Here the rvd.xml externalServiceTimeout
(default value) prevails over RVD ES Timeout
. This was not expected (at least for me). At precisely the default value, i.e. 5 seconds, it timeouts (not at the ES timeout at 6 seconds). Fortunately, what is shown in the RVD log is consistent with what happens in the wire. Likewise, good news is that SIP BYE is sent to the USSD Gw at 5 seconds and session is properly closed on either sides. Although ES Timeout is not prevailing for the specific RVD project, at least it works with rvd.xml externalServiceTimeout
default value and nothing else breaks beyond there.
Traces and RVD log: ESTimeout6s_externalServiceTimeoutUNSET_USSD25s_sleep20.zip
RVD ES Timeout
= 3 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 6 secs
External Service sleep
time = 20 secs
Everything works as expected. At precisely 3 seconds after the ES timeout was triggered, it timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides. It's important to note that when the ES Timeout
is lower than externalServiceTimeout
in rvd.xml, then everything works.
Traces and RVD log: ESTimeout3s_externalServiceTimeout6s_USSD25s_sleep20.zip
RVD ES Timeout
= 10 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 6 secs
External Service sleep
time = 20 secs
Notice here that, opposite to what was set in test 3), RVD ES Timeout
is greater than rvd.xml externalServiceTimeout
. In this scenario, almost everything goes wrong. First, as seen in 2), rvd.xml externalServiceTimeout
(now set to 6000) prevails over RVD ES Timeout
. Second, nothing is sent back to Restcomm, so it doesn't really timeout (opposite of what is shown in rvd log). Third, as a consequence of the latter, no SIP BYE is sent to USSD Gw. Fourth, when USSD Gw timeouts, SIP BYE sent by it is never answered by Restcomm (this is tied with issue #2411), so the USSD service is held forever (apparently) in Restcomm (and USSD Gw spends quite some unnecessary time occupied in trying to stop the SIP session after it timed out on its side).
Traces and RVD log: ESTimeout10s_externalServiceTimeout6s_USSD25s_sleep20s.zip
So, RVD ES Timeout works as expected as long as it is set with a lower value than the default one (5 secs) or the one set manually in rvd.xml externalServiceTimeout
.
I am working on this issue and I discovered that the value in the rvd.xml is not taken into account. I already have a fix for that. There is another issue where the timeout doesn't follow the settings and it always times out at about 5 seconds no matter what value you have in the rvd.xml and the ES timeout in the APP. Investigation still in progress.
Thanks for feedback @charles!
@abdulazizali77, fyi, please be in sync with Charles.
@FerUy , some more backkground on internals:
externalServiceTimeout
value of 5000 and an ES elemet timeout
value of 7000, effectively your timeout should truncated to 5000.externalServiceTimeout
specified in rvd.xml it will default to 5000. In that case any value greater that 5000 in ES element timeout
will be truncated to 5000.I think these explain test(4) aand test(2)
cc @abdulazizali77 , @croufay
@otsakir right, we already reached an agreement on that. What I foresee, and that was the key of my statement of not expecting that behaviour, is that customers will not be aware of all that, so they will not understand the behaviour if they (incorrectly) set a timeout greater than 5000 and they perceive otherwise in tests. Having said that, I prefer it is like you that (and we all agreed on that), as external services with responses higher than 5 seconds would make a snowball effect that would bring the system down eventually under very high load.
In the weekend I did a lot of other tests and we can summarize the most important one as following (as it has the parameters set as we need them in production):
RVD ES Timeout
= not set
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 2 secs
External Service sleep
time (delayed response) = 20 secs
Everything went perfectly well as you can see in attached logs/traces.
externalServiceTimeout2secs.zip
Important: This test was done with latest fixes merged for #2411 (merged PR #2438), in other words, with Restcomm-Connect release 8.2.0.1789, and latest USSD Gw public release (7.1.62), covering https://github.com/RestComm/ussdgateway/issues/69, tied to problems detected in #2411.
@otsakir @croufay @abdulazizali77 @gmlemus @vetss, as agreed today, I carried out new tests using Restcomm-JBoss-AS7-8.2.0.1288 and TelScale-ussd-7.1.1.541 (enhanced by replacing TelScale jars with public restcomm-slee-ra-map-du-7.3.72.jar
, services-DU-7.1.62.jar
and sip11-ra-DU-2.9.94.jar
as for USSD Gateway latest public version, 7.1.62). Nothing was changed in restcomm-rvd.war from what was downloaded from the RC release.... these were the results:
RVD ES Timeout
= 3 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= not set (default)
External Service sleep
time = 20 secs
Everything works as expected. At precisely 3 seconds after the ES timeout was triggered, it timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
RVD ES Timeout
= 6 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= not set (default)
External Service sleep
time = 20 secs
Everything works as expected. At precisely 5 seconds (default) RC timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
RVD ES Timeout
= 3 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 6 secs
External Service sleep
time = 20 secs
Everything works as expected. At precisely 3 seconds after the ES timeout was triggered, it timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
RVD ES Timeout
= 10 secs
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 6 secs
External Service sleep
time = 20 secs
Here there is a problem. Nothing is sent back to Restcomm, so it doesn't really timeout (opposite of what is shown in rvd application log). Third, as a consequence of the latter, no SIP BYE is sent to USSD Gw. Finally, USSD Gw times out and SIP BYE is received and the session is ended on both sides (see error logged in attached server.log and in Test4 folder (improvement here -session is eventually terminated when USSD Gw sends SIP BYE- is due to workaround made to #2411 by @abdulazizali77).
RVD ES Timeout
= UNSET
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= 2 secs
External Service sleep
time = 20 secs
This test is important because is the one we aim to set in a live system. Fortunately, everything goes as expected. At precisely 2 seconds RC timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
RVD ES Timeout
= UNSET
USSD Gateway dialogtimeout
= 25 secs
rvd.xml externalServiceTimeout
= UNSET
External Service sleep
time = 20 secs
Everything works as expected. At precisely 5 seconds (default) RC timeouts and what is shown in the RVD log is consistent with what happens in the wire. SIP BYE is sent to the USSD Gw and session is properly closed on either sides.
Find attached traces and logs of all tests described above next:
Same results as in previous comment with today's generated TelScale version (8.2.0.1580), find traces/logs attached here: TelScale-RC-1580_tests.zip
@FerUy,
RVD ES Timeout
will get ignored if rvd.xml externalServiceTimeout
has been defined too and the latter is smaller than the first. More on why this happens at the end of the comment. So, from the RVD side the effective timeout setting is 6 sec. Restcomm will also by default wait for 6 sec as defined in restcomm.xml for any http (and RCML request). Since there is also roundtrip time when restcomm contacts RVD (asks for RCML), by definition, rcomm will always timeout first and won't wait until RVD sends back the RCML response.
The bottom-line is that the http-client timeouts in restcomm.xml should ALWAYS be greater than RVD effective timeout . How much greater ? It depends on how much time it takes rcomm to reach RVD. I'd say that 1 sec is more than enough.
@FerUy , can you check the http-client/response-timeout value in restcomm.xml and make sure it's less than RVD timeout value (rvd.xml externalServiceTimeout) ? What was the case for test(4) ? Can you confirm my hypothesis ?
RVD ES Timeout
gets ignored if greater than rvd.xml/externalServiceTimeout
Allowing big timeouts in an application (i.e. leaving this up to the user defining it in ES element) is a bad idea for two reasons:
So, we need to have this controlled from a single place and let the administrator do it instead of allowing users play with the fire in their applications. That's why the rvd.xml/externalServiceTimeout
if defined will always truncate any user-supplied timeout value in an ES element. Of course, if rvd.xml/externalServiceTimeout
is not defined, STILL, the default 5 sec timeout will be used for it as described in the comments in rvd.xml.
We could provide more information in the comments both in rvd.xml and restcomm.xml/http-client/response-timeout so that the user has a hint that the restcomm timeout should be greater than the RVD timeout.
@FerUy ,
I had a conf call with @otsakir about this and I tested using the latest Telscale version of Restcomm-connect and everything worked fine taking into account the value of rvd.xml file.
Test 4)
USSD Gateway dialogtimeout = 25 secs RVD externalServiceTimeout = 25 secs External Service sleep time = 20 secs
The above test is slightly flawed for the reason that Restcomm also has a default http response-timeout that defaults to 6000ms. According to @otsakir, the value in the rvd.xml or in the rvd app must be lower than Restcomm-connect's http response-timeout value. In any production setting, there will be no reason to have a value higher than 5000ms in production so I think the decision to use 2000ms in the rvd.xml file should cover our usecase.
Hi @otsakir @croufay,
Thanks you both for detailed explanation. I completely agree with the approach, so test 4 makes no sense in the context of this issue, but it's good for the overall landscape. Absolutely, we don't want user's wrong use of the application to tear down the whole system, and this is the correct way to deal with :+1:
Oh, and I almost forgot your request @otsakir:
<http-client>
<response-timeout>6000</response-timeout>
<!-- Control peer certificate verification for https connections. Two modes are supported:
'allowall' : Disables peer certificate verification. Use it when testing.
'strict' : Fails requests if peer certificate is not trusted. Use it in production. -->
<ssl-mode>strict</ssl-mode>
<!-- Control whether relative URL should be resolved using hostname instead of IP Address.
If for any reason hostname resolution fails, IP Address will be used -->
<use-hostname-to-resolve-relative-url>true</use-hostname-to-resolve-relative-url>
<!-- Optionally provide the hostname to be used, otherwise Java will try to get the hostname of the machine JVM is running -->
<hostname>192.168.1.47</hostname>
</http-client>
So, yes, that confirms your theory ;)
ok @FerUy thanks. I've improved the comments for timeout both in RVD and restcomm:
https://github.com/RestComm/visual-designer/blob/5152023a63816f3e9bd61a76c41034e335440d38/designer/src/main/webapp/WEB-INF/rvd.xml#L42-L42 https://github.com/RestComm/RestComm-Connect/blob/cc37c8c8edcfe220db5fcf7b6929b0d90fa19d1b/restcomm/restcomm.application/src/main/webapp/WEB-INF/conf/restcomm.xml#L407-L407
Can we close this ?
Hi @otsakir, yes, just doing that :)
An issue was found when testing rvd.xml
externalServiceTimeout
parameter for a USSD Gw <-> Restcomm-Connect implementation for a customer who needs to set a low timeout value for On timeout within USSD RVD External Services modules, in order to preventing delayed external services to impact the performance by queuing USSD ongoing sessions.No matter which value is set, Restcomm-Connect does not proceed with the corresponding SIP message (neither SIP INFO nor SIP BYE depending on the module -USSD Collect or USSD Message respectively).
The test consisted on a simple USSD RVD project to simply test this (attached). The external service targets a simple php script with a sleep function at the beginning. If the sleep time (1 second) is lower than
externalServiceTimeout
value (2 seconds), the project proceeds to the final USSD Message. This can be easily seen on the USSD RVD project log:But, if the sleep time is set greater than
externalServiceTimeout
value (2 seconds), then the following happens: 1) As expected, On Timeout is triggered at precisely 2 seconds (externalServiceTimeout
value), as can be seen below in USSD RVD project log (this time going to the corresponding timeout message, either a USSD Message or a USSD Collect):2) Opposite to what the previous logs show, USSD Gateway never receives neither SIP BYE nor SIP INFO for neither USSD Message nor USSD Collect actions triggered/logged. Then, after the USSD Gateway timeouts by itself, it sends the corresponding MAP End message to the SS7 network (as its
dialogtimeout
parameter dictates -25 seconds in this example tests-).Both situations for On Timeout (being triggered or not) can be verified in the attached Wireshark traces. While when there's not a timeout, everything is processed normally, when the external service times out, no SIP messages is sent back to USSD Gateway.
Simple_USSD_project.zip USSD-Restcomm_RVD_ES-NOTimeoutTests.pcap.pcapng.zip USSD-Restcomm_RVD_ES-TimeoutTests.pcap.pcapng.zip