EricssonResearch / openwebrtc

A cross-platform WebRTC client framework based on GStreamer
http://www.openwebrtc.org
BSD 2-Clause "Simplified" License
1.8k stars 537 forks source link

TURN Allocation Mismatch errors #589

Open alessandrod opened 8 years ago

alessandrod commented 8 years ago

After a successful TURN allocation request, closing an openwebrtc client and reopening it shortly after (minutes), often leads to TURN failing due to Allocation Mismatch errors. I have reproduced this with two TURN servers, and it's reliably reproducible with coturn.

Something seems to be fishy with session lifetimes, the transaction ids used by libnice and allocation request retransmissions.

alessandrod commented 8 years ago

I've been looking at network dumps. This happens when libnice picks a HOST:PORT pair for a candidate which was already used for a previous TURN session that hasn't expired yet. This is especially likely to happen if someone uses owr_transport_agent_set_local_port_range with a small range.

alessandrod commented 8 years ago

The RFC says the client should do this:

  An allocation created in this matter
  will eventually timeout, since the client will not refresh it.
  Furthermore, if the client later retries with the same 5-tuple but
  different transaction id, it will receive a 437 (Allocation
  Mismatch), which will cause it to retry with a different 5-tuple.

So libnice should probably retry with a different HOST:PORT pair before giving up.

xelven commented 8 years ago

petty nice, I think there is something wrong with port range in libnice, I was setup the port range in sdk by owr_transport_agent_set_local_port_range(transport_agent,40000,65535); after transport_agent alloc, but it seems also got weird data such as the problem I report to libnice.

1.STUN discovery weird value for port (both Public and internal).
I used libnice-0.1.13 and build on my Mac and I modify code for added the port range in sdp-example.c,
"nice_agent_set_port_range(agent,stream_id,1,40000,65535);"
but it always get the port 9, here is log: 
Mac:examples allenchan$ sudo sdp-example 0 "stun.l.google.com" 19302
Generated SDP from agent :
m=text 52486 ICE/SDP
c=IN IP4 192.168.1.112
a=ice-ufrag:R1si
a=ice-pwd:NdKMhMMhoxV+pnAKM0e5IB
a=candidate:1 1 UDP 2013266431 192.168.1.112 63658 typ host
a=candidate:2 1 TCP 1019216383 192.168.1.112 9 typ host tcptype active
a=candidate:3 1 TCP 1015022079 192.168.1.112 52486 typ host tcptype passive
a=candidate:4 1 UDP 2013266431 fe80::7ec3:a1ff:feaa:503 63659 typ host
a=candidate:5 1 TCP 1019216383 fe80::7ec3:a1ff:feaa:503 9 typ host tcptype active
a=candidate:6 1 TCP 1015022079 fe80::7ec3:a1ff:feaa:503 52487 typ host tcptype passive

if in iOS device is going to be port 0
    sdp = "v=0
\no=- 1455000713520020200 1 IN IP4 127.0.0.1
\ns=-
\nt=0 0
\nm=audio 1 RTP/SAVPF 111 8 0
\nc=IN IP4 0.0.0.0
\na=rtcp-mux
\na=sendrecv
\na=rtpmap:111 OPUS/48000/2
\na=rtpmap:8 PCMA/8000
\na=rtpmap:0 PCMU/8000
\na=ice-ufrag:zWmW
\na=ice-pwd:sjkGcsCaD39FsDsuziLprr
\na=candidate:1 1 UDP 2013266431 192.168.31.149 51640 typ host
\na=candidate:2 1 TCP 1019216383 192.168.31.149 0 typ host tcptype active
\na=candidate:3 1 TCP 1015022079 192.168.31.149 50899 typ host tcptype passive
\na=candidate:4 1 UDP 1677722111 220.130.33.227 51640 typ srflx raddr 192.168.31.149 rport 51640
\na=candidate:5 1 TCP 847249919 220.130.33.227 0 typ srflx raddr 192.168.31.149 rport 0 tcptype active
\na=candidate:6 1 TCP 843055615 220.130.33.227 50899 typ srflx raddr 192.168.31.149 rport 50899 tcptype passive
\na=fingerprint:sha-256 75:B0:02:16:F3:42:18:B5:B8:F6:2C:5A:D9:6F:6A:AD:F6:09:A8:65:55:21:D4:B5:0A:24:67:4A:33:7B:C7:BB
\na=setup:actpass
\n";
    type = offer;
xelven commented 8 years ago

this is interesting, it's the seesion you talking about is TURN server side, right? I was found the problem when the openwebrtc close there no one call the api of libnice: nice_agent_remove_stream to clean, so it will always send packet once they succeed in background. https://github.com/EricssonResearch/openwebrtc/issues/579

but this is seems if I totally close OWR process there should be no more thing to do in background, and client won't keep any session. I not sure if re-open the OWR client to connecting TURN server , the logic in Server side will alloc new session for this one or find old one?

alessandrod commented 8 years ago

So libnice should probably retry with a different HOST:PORT pair before giving up.

The following gist implements retrying as per RFC. With this patch I always get local and remote video in NativeDemo <=> Firefox. The patch is not 100% correct since it ignores the port range set with nice_agent_set_port_range.

https://gist.github.com/alessandrod/c520bbda3dfca6ed930c