Sanjivkumarroshan / csipsimple

Automatically exported from code.google.com/p/csipsimple
0 stars 0 forks source link

ZRTP + ICE : deadlock and failure management #2379

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.setup two CSimpleSIP clients with create-ZRTP (STUN disabled) and ICE 
enabled. Register them with TURN server to relay all RTPs media.
2. Setup FS with inbound-zrtp-passthru and bypass_mode (media point-2-point).
3. Connect both CSimpleSIP clients on two different networks (3G and WiFi - it 
will work on the same LAN) and call each other. 

What is the expected output? 
The expect output should be that both CSimpleSIP clients wil share the same 
ZRTP SAS on their screen.

What do you see instead?
Unfoutunatly both clients just connect oer the TURN server (both ways audio - 
OK) and no ZRTP SAS indication.
The ZRTP SAS is displayed on both clients when they are connected on the same 
LAN network.

What version of the product are you using? 
r2225
On what device / operating system? 
SONY ST21a Android 4.04 + SONY MT27i Adroid 4.04

Please provide any additional information below.
In the SSimpleSip call log file I see that the ZRTP request fail (timeout)- 
maybe because one tries to connect the other with both clients private IP 
addresses (10.205.94.140:4010-->192.168.1.106:4017) instead of using the pbulic 
addresses (mapped to the TURN/ICE).

D/libpjsip(26459): 17:17:38.181   conference.c  ......Port 0 (OpenSL ES Audio) 
transmitting to port 2 (sip:1003@91.205.155.18:5090)
D/libpjsip(26459): 17:17:38.192   Master/sound !Underflow, buf_cnt=0, will 
generate 1 frame
D/libpjsip(26459): 17:17:38.196        icetp00 !..Check 0: [1] 
10.205.94.140:4010-->192.168.1.106:4017: state changed from Waiting to In 
Progress
I/libpjsip(26459): 17:17:38.222 zrtp_android.c !ZRTP failed: 3, subcode: 5

How to make ZRTP working between two clients which relay RTP media over TURN 
server?

Thanks for the support.

Regards
Assaf

Original issue reported on code.google.com by adah...@gmail.com on 28 May 2013 at 7:46

GoogleCodeExporter commented 9 years ago
More test results that pointing out the ZRTP fail reason (I assume so) due to 
the incorrect ICE addressing decesion. This caller's log shows that the ICE is 
checking the connection with the private addresses instead of the public 
addresses conection (after that the ICE keeps testing the other connections and 
seccessfuly open the public addresses conection for RTP audio - but with ZRTP 
which has failed earlier):

ICE incorrect connection => ZRTP fail:
D/libpjsip(11520): 10:40:23.921        icetp00 !......Processing SDP: support 
ICE=1, common comp_cnt=2, ice_mismatch=0, ice_restart=0, local_role=Controlling
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 3: [1] 
10.182.162.190:4008-->192.168.1.106:4009 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 5: [2] 
10.182.162.190:4000-->192.168.1.106:4036 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 3: [1] 
10.182.162.190:4008-->94.159.132.138:4009 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 4: [2] 
10.182.162.190:4000-->94.159.132.138:4036 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 6: [1] 
10.182.162.190:4008-->91.205.155.18:13737 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Check 10: [2] 
10.182.162.190:4000-->91.205.155.18:12187 pruned (duplicate found)
D/libpjsip(11520): 10:40:23.925        icetp00  ......Checklist created:
D/libpjsip(11520): 10:40:23.925        icetp00  ...... 0: [1] 
10.182.162.190:4008-->192.168.1.106:4009 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 1: [2] 
10.182.162.190:4000-->192.168.1.106:4036 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 2: [1] 
10.182.162.190:4008-->94.159.132.138:4009 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 3: [2] 
10.182.162.190:4000-->94.159.132.138:4036 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 4: [1] 
10.182.162.190:4008-->91.205.155.18:13737 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 5: [1] 
91.205.155.18:19518-->192.168.1.106:4009 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 6: [1] 
91.205.155.18:19518-->94.159.132.138:4009 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 7: [1] 
91.205.155.18:19518-->91.205.155.18:13737 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 8: [2] 
10.182.162.190:4000-->91.205.155.18:12187 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 9: [2] 
91.205.155.18:15765-->192.168.1.106:4036 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 10: [2] 
91.205.155.18:15765-->94.159.132.138:4036 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.932        icetp00  ...... 11: [2] 
91.205.155.18:15765-->91.205.155.18:12187 (not nominated, state=Frozen)
D/libpjsip(11520): 10:40:23.939        icetp00  ......Starting ICE check..
D/libpjsip(11520): 10:40:23.939        icetp00  .......Check 0: [1] 
10.182.162.190:4008-->192.168.1.106:4009: state changed from Frozen to Waiting
D/libpjsip(11520): 10:40:23.939        icetp00  .......Check 5: [1] 
91.205.155.18:19518-->192.168.1.106:4009: state changed from Frozen to Waiting
D/libpjsip(11520): 10:40:23.944    pjsua_aud.c  ......Audio channel update..
D/libpjsip(11520): 10:40:23.944   strm0x596dac  .......Encoder stream started
D/libpjsip(11520): 10:40:23.944   strm0x596dac  .......Decoder stream started
D/libpjsip(11520): 10:40:23.945  pjsua_media.c  ......Audio updated, stream #0: 
SILK (sendrecv)
D/libpjsip(11520): 10:40:23.954    pjsua_aud.c  .....Conf disconnect: 1 -x- 0
D/libpjsip(11520): 10:40:23.954   conference.c  ......Port 1 (ringback) stop 
transmitting to port 0 (OpenSL ES Audio)
D/SIP UA Receiver(11520): < LOCK CPU
D/libpjsip(11520): 10:40:23.954        icetp00 !Checklist: state changed from 
Idle to Running
D/libpjsip(11520): 10:40:23.955        icetp00  Starting checklist periodic 
check
D/libpjsip(11520): 10:40:23.955        icetp00  .Sending connectivity check for 
check 0: [1] 10.182.162.190:4008-->192.168.1.106:4009
D/libpjsip(11520): 10:40:23.959        icetp00  ..Check 0: [1] 
10.182.162.190:4008-->192.168.1.106:4009: state changed from Waiting to In 
Progress
D/SIP UA Receiver(11520): Updating call infos from the stack
D/PjSipCalls(11520): Update call 0
I/libpjsip(11520): 10:40:23.965 zrtp_android.c !ZRTP failed: 3, subcode: 5
D/libpjsip(11520): 10:40:23.965   strm0x596dac  Error sending RTP: ICE check is 
in progress (PJNATH_EICEINPROGRESS)
D/libpjsip(11520): 10:40:23.966   strm0x596dac  Error sending RTP: ICE check is 
in progress (PJNATH_EICEINPROGRESS)

ICE correct connection => call connect with RTP audio:
D/libpjsip(11520): 10:40:25.471        icetp00  ..Check 10: [2] 
91.205.155.18:15765-->94.159.132.138:4036 (nominated): connectivity check 
SUCCESS
D/libpjsip(11520): 10:40:25.471        icetp00  ..Check 10: [2] 
91.205.155.18:15765-->94.159.132.138:4036: state changed from In Progress to 
Succeeded
D/libpjsip(11520): 10:40:25.471        icetp00  ..Check 10 is successful  and 
nominated
D/libpjsip(11520): 10:40:25.471        icetp00  ..Cancelling check 1: [2] 
10.182.162.190:4000-->192.168.1.106:4036 (In Progress)
D/libpjsip(11520): 10:40:25.490        icetp00  ..Check 1: [2] 
10.182.162.190:4000-->192.168.1.106:4036: state changed from In Progress to 
Failed
D/libpjsip(11520): 10:40:25.491        icetp00  ..Cancelling check 3: [2] 
10.182.162.190:4000-->94.159.132.138:4036 (In Progress)
D/libpjsip(11520): 10:40:25.497        icetp00  ..Check 3: [2] 
10.182.162.190:4000-->94.159.132.138:4036: state changed from In Progress to 
Failed
D/libpjsip(11520): 10:40:25.497        icetp00  ..Cancelling check 8: [2] 
10.182.162.190:4000-->91.205.155.18:12187 (In Progress)
D/libpjsip(11520): 10:40:25.501        icetp00  ..Check 8: [2] 
10.182.162.190:4000-->91.205.155.18:12187: state changed from In Progress to 
Failed
D/libpjsip(11520): 10:40:25.501        icetp00  ..Cancelling check 9: [2] 
91.205.155.18:15765-->192.168.1.106:4036 (In Progress)
D/libpjsip(11520): 10:40:25.507        icetp00  ..Check 9: [2] 
91.205.155.18:15765-->192.168.1.106:4036: state changed from In Progress to 
Failed
D/libpjsip(11520): 10:40:25.507        icetp00  ..Cancelling check 11: [2] 
91.205.155.18:15765-->91.205.155.18:12187 (In Progress)
D/libpjsip(11520): 10:40:25.512        icetp00  ..Check 11: [2] 
91.205.155.18:15765-->91.205.155.18:12187: state changed from In Progress to 
Failed
D/libpjsip(11520): 10:40:25.512        icetp00  ..ICE process complete, 
status=Success
D/libpjsip(11520): 10:40:25.512        icetp00  ..Valid list
D/libpjsip(11520): 10:40:25.513        icetp00  .. 0: [1] 
91.205.155.18:19518-->94.159.132.138:4009 (nominated, state=Succeeded)
D/libpjsip(11520): 10:40:25.513        icetp00  .. 1: [2] 
91.205.155.18:15765-->94.159.132.138:4036 (nominated, state=Succeeded)
D/libpjsip(11520): 10:40:25.534        icetp00 !ICE negotiation success after 
1s:611
D/libpjsip(11520): 10:40:25.537        icetp00   Comp 1: sending from relay 
candidate 91.205.155.18:19518 to srflx candidate 94.159.132.138:4009
D/libpjsip(11520): 10:40:25.541        icetp00   Comp 2: sending from relay 
candidate 91.205.155.18:15765 to srflx candidate 94.159.132.138:4036
D/libpjsip(11520): 10:40:25.560   pjsua_call.c  .Call 0 sending UPDATE for 
updating ICE transport address after ICE negotiation

Original comment by adah...@gmail.com on 29 May 2013 at 8:55

GoogleCodeExporter commented 9 years ago
Does your TURN server supports ZRTP?
As far as I understood ZRTP is designed to be sure there is nothing not trusted 
in the middle that could listen your conversation.
By definition a TURN server is something in the middle of the media path.
So if it drops or doesn't handle properly zrtp packets the entire ZRTP mecanism 
will not work and you'll see the call is not secured.

Original comment by r3gis...@gmail.com on 31 May 2013 at 10:19

GoogleCodeExporter commented 9 years ago

Original comment by r3gis...@gmail.com on 31 May 2013 at 10:19

GoogleCodeExporter commented 9 years ago
My TURN server (like any others) is totaly transparent - just relaying UDP 
traffic (RTP) from one port to another without any UDP/RTP modification. By 
definition it cannot support any app protocol beside STUN/TURN.

When both CSimpleSip clients are being relayed via the TURN server and the 
FS=proxy_media (transparent bridge) then the ZRTP is passing end2end between 
both clients (TwoFish SAS). that shows that the TURN is just relaying the ZRTP 
without breaking it. The FS is also just bridging the media without touching it 
at all.
With this case the CSimpleSip log shows that the ice checking proccess is being 
break by the FS which set ice-mismatch in the SDP (FS in proxy_media is 
managing back to back the SIP INVITE/OK). This ice-mismatch force the use of 
the FS SDP media ports as it set in the SDP c&m attributes. the successful ZRTP 
path is: client->TURN->FS->TURN->client.

When FS=bypass_media the ZRTP fails! it looks like the ice process cannot 
manage the right candidate path for the first ZRTP HELLO RTP package (it could 
probably work if the SDP c&m attributes of the TURN relay where used instead of 
the ice process). 

The ZRTP RFC spec says that the ZRTP HELLO/ACK handshack can success even when 
one client dial the other end directly with txt name@ipaddr without any SIP 
proxy in between so it is clearly not a MUST to have a ZRTP server in the ZRTP 
path. It just requires a clear RTP media channel to passthrugh it the ZRTP RTPs 
(and TURN can clearly do it). 
The server MUST support ZRTP when a client is being bridged with a none ZRTP 
endpoint (MiTM case) of when ZRTP Enrolment mode is required (Trusted MiTM).

I believe that the ZRTP is somehow not synchoronized with the ice checking 
proccess for the right validated media connection.

Hopefully someone could look into the code and resolve this issue.

Thanks

Assaf 

Original comment by adah...@gmail.com on 2 Jun 2013 at 2:22

GoogleCodeExporter commented 9 years ago
I experience the very same issue (logs are virtually the same, incl. "ZRTP 
failed: 3, subcode: 5"), but with *TURN turned off*. Having ICE enabled is 
sufficient to reproduce this, even with two phones on the same WLAN AP (private 
IPs mutually visible). 

ZRTP with bypass_media=true in FreeSwitch (i.e. media relayed via FreeSwitch 
without modification) works OK.

CSipSimple 1.01.00 r2272 (from the Play store)
FreeSWITCH Version 1.5.6b+git~20130902T175658Z~dc9815461f (git dc98154 
2013-09-02 17:56:58Z)

I can supply any logs or perform additional diagnostics.

Original comment by tomas.po...@gmail.com on 4 Sep 2013 at 9:04

GoogleCodeExporter commented 9 years ago
Latest nightly build introduces a partial fix for ICE + ZRTP support.

Original comment by r3gis...@gmail.com on 9 Sep 2013 at 11:18

GoogleCodeExporter commented 9 years ago
Quick update : actually we had a problem with latest nightly production (which 
prevented the fix to be taken into account). It should be resolved in next 
nightly build.

Original comment by r3gis...@gmail.com on 9 Sep 2013 at 6:12

GoogleCodeExporter commented 9 years ago
Régis Montoya <r3gis.3r@gmail.com> in csipsimple-users on Sep 9:

> I did some changes to improve the situation. The nightly build version should 
> have the problem almost fixed.
> The only remaining problem is that sometimes the activation of ZRTP after ICE 
> seems to introduce a deadlock somewhere (and in this case no audio and call 
> is blocked). It should not occur too frequently, but it's still necessary to 
fix 
> that and I'll try to spent more free time (when I'll have) on this problem.

Hello Régis, 

I tried the nightly build r2291, *ZRTP + ICE works* (relayed via TURN and also 
directly on the same WLAN AP), but I get those freezes quite frequently (once 
in 8 calls maybe; I need to kill CSipSimple via Task manager to revive).

Additionally, I spotted the following binary garbage (non-ASCII slash char in 
transport_zrtp.c line 927) in the log: 

  D/libpjsip( 1354): 13:03:44.086 transport_zrtp  zrtp :: has ice transport is yes... state is 4 ∕ 1

Thanks for the fix, I'd appreciate if you'd find time to catch the deadlock... 

Original comment by tomas.po...@gmail.com on 10 Sep 2013 at 11:51

GoogleCodeExporter commented 9 years ago
Found one more (more serious) problem with the patch. 

It manifests when calling an IVR directly on Freeswitch (i.e. not between 
phones). 
In this case audio works immediately after dialing, but *ICE fails* after 8s 
and thus *ZRTP negotiation is never performed*.

D/libpjsip( 5732): 14:38:24.607 transport_zrtp  zrtp :: has ice transport is 
yes... state is 4 / 1
D/libpjsip( 5732): 14:38:24.626        icetp00  ICE negotiation failed after 
8s:249: All ICE checklists failed (PJNATH_EICEFAILED)
D/libpjsip( 5732): 14:38:24.629 transport_zrtp !zrtp :: has ice transport is 
yes... state is 6 / 1

I think we should fire up ZRTP also when ICE fails, as the audio stream might 
work properly even without ICE (as in my case).

Full log here: http://pastebin.com/JVBstn2q

Original comment by tomas.po...@gmail.com on 12 Sep 2013 at 1:39

GoogleCodeExporter commented 9 years ago
Hi Tomas,

Thanks a lot for all this valuable feedback.

The second point (comment 9) is probably easy to fix. Thanks to the log you 
sent, we know exactly the status of ice neg and it should be easy to decide to 
start zrtp when state is 6 (I think I did only on 5 before).

About the deadlock, I think there is more time to spent on the topic. I already 
spent couple of hours without luck, but I'll try to find some free time to 
tackle this one.
I'll update this issue when something will be ready to test.

Original comment by r3gis...@gmail.com on 13 Sep 2013 at 1:05

GoogleCodeExporter commented 9 years ago
Issue 2193 has been merged into this issue.

Original comment by r3gis...@gmail.com on 13 Sep 2013 at 10:53

GoogleCodeExporter commented 9 years ago
I've just pushed in revision 2299 something that should fix the deadlock. 
It will be produced in tonight nightly build. Do not hesitate to give me 
feedback. I spent my afternoon to debug the problem and I'm not fully sure the 
fix is the good one. From latests tests I did it seems good but as it involves 
both ZRTP lib and pjsip lib and android timers it's hard to test all cases.

Original comment by r3gis...@gmail.com on 15 Sep 2013 at 4:34

GoogleCodeExporter commented 9 years ago
We tried rev 2300 today and the ZRTP problem on ICE fail (Comment #9) is fixed, 
ZRTP kicks in about 8 seconds into the call, there's basically no pause in 
audio :)

We have not performed enough tests to see whether the deadlock is gone, but 
we'll continue tomorrow.

Original comment by tomas.po...@gmail.com on 16 Sep 2013 at 9:10

GoogleCodeExporter commented 9 years ago
We have not observed the deadlock anymore, I suggest to close this issue, if we 
observe it again, we'll report.

Original comment by tomas.po...@gmail.com on 25 Sep 2013 at 7:37

GoogleCodeExporter commented 9 years ago
Great !
Thanks for the feedback

Original comment by r3gis...@gmail.com on 25 Sep 2013 at 7:44

GoogleCodeExporter commented 9 years ago
Issue 2193 has been merged into this issue.

Original comment by r3gis...@gmail.com on 18 Oct 2013 at 8:10