Open GoogleCodeExporter opened 9 years ago
I should point out a couple other testing scenarios I tried prior to reporting:
1. I tested on Samsung Galaxy S (GT-I9000T) running Eclair 2.1 update 1, occurs
equally on WiFi and 3G
2. I tested on MyTouch 4G (HTC Glacier) running Froyo 2.2, occurs less
frequently on 3G and difficult to reproduce borken sockets on WiFi
Original comment by rich.ho...@gmail.com
on 4 Feb 2011 at 7:35
[deleted comment]
[deleted comment]
Using EITHER the precompiled build or a custom Doubango stack I found the
problem. It's Doubango's STUN implementation breaking the Android native
sockets intermittently.
Leave STUN off, sockets are stable...
Turn STUN on, and eventually you will get the trace below...
When will Doubango have a STUN implementation that has a NAT pinhole keepalive
& not break the Android native sockets like below? (let me know if I am asking
the wrong forum, be happy to help fix/test corrections...)
02-04 21:29:03.871: DEBUG/org.doubango.imsdroid.Services.Impl.SipService(1961):
Stack stoped
02-04 21:29:04.541: INFO/org.doubango.imsdroid.Services.Impl.SipService(1961):
realm='sip:share.aylus.com', impu='sip:#####@foo.bar', impi='#####@foo.bar'
02-04 21:29:04.541: INFO/org.doubango.imsdroid.Services.Impl.SipService(1961):
STUN=yes
02-04 21:29:04.541: INFO/org.doubango.imsdroid.Services.Impl.SipService(1961):
STUN2 - server=numb.viagenie.ca and port=3478
02-04 21:29:04.541: INFO/org.doubango.imsdroid.Services.Impl.SipService(1961):
pcscf-host='#.#.#.#', pcscf-port='5060', transport='UDP', ipversion='ipv4'
02-04 21:29:04.551:
DEBUG/org.doubango.imsdroid.Services.Impl.NetworkService(1961): 26.180.59.20
02-04 21:29:04.551: INFO/org.doubango.imsdroid.Services.Impl.SipService(1961):
Local IP='26.180.59.20'
02-04 21:29:04.551:
DEBUG/org.doubango.imsdroid.Services.Impl.NetworkService(1961): netType=0 and
netSubType=8
02-04 21:29:04.561: DEBUG/org.doubango.imsdroid.Services.Impl.SipService(1961):
Stack started
02-04 21:29:04.561: DEBUG/org.doubango.imsdroid.Services.Impl.SipService(1961):
OnDialogEvent (Dialog connecting)
02-04 21:29:04.561: INFO/org.doubango.imsdroid.Main(1961): onRegistrationEvent
02-04 21:29:04.561: INFO/org.doubango.imsdroid.Screens.ScreenHome(1961):
onRegistrationEvent
02-04 21:29:04.571:
INFO/org.doubango.imsdroid.Services.Impl.ServiceManager(1961):
onRegistrationEvent
02-04 21:29:04.571:
DEBUG/org.doubango.imsdroid.Services.Impl.ServiceManager(1961):
showRegistartionNotif
02-04 21:29:04.671: DEBUG/org.doubango.imsdroid.Services.Impl.SipService(1961):
OnDialogEvent ((un)REGISTER request successfully sent.)
02-04 21:29:04.741: DEBUG/skia(1961): ===== ashmem pin_region(43) returned -1,
treating as error 9
02-04 21:29:05.171: ERROR/tinyWRAP(1961): ***ERROR: function:
"tnet_get_ip_n_port()"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): file: "src/tnet_utils.c"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): line: "920"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): MSG: TNET_GET_SOCKADDR has failed
with status code: -1
02-04 21:29:05.171: ERROR/tinyWRAP(1961): ***ERROR: function:
"tnet_transport_sendto()"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): file: "src/tnet_transport_poll.c"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): line: "241"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): MSG: sendto have failed.
02-04 21:29:05.171: ERROR/tinyWRAP(1961): ***ERROR: function:
"tnet_transport_sendto()"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): file: "src/tnet_transport_poll.c"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): line: "241"
02-04 21:29:05.171: ERROR/tinyWRAP(1961): MSG: (SYSTEM)NETWORK ERROR ==>Bad
file number
Original comment by rich.ho...@gmail.com
on 5 Feb 2011 at 2:42
I found the problem in Doubango 'src/tnet_transport_poll.c, after line 671'.
The answer to your FIXME question is 'NO', if ioctl for FIONREAD returns 0 or
greater but the &len is set to '0' bytes, an Android native socket decriptor
has NOT been closed. It's still a viable socket descriptor and SHOULD NOT be
closed, even if it needed to be closed, Doubango should fire off an EVENT_ERROR
for the IMSDroid to decide what to do. EVENT_CLOSE confuses IMSDroid and keeps
trying to register, creating later socket errors on a dead socket. For an
interim fix since this error is slowing me down here is what I modified in the
Doubango tree for you to consider. My tweak sends the warning still, disables
the event_close and leaves the native Android socket open still. I will let you
know if I have anymore problems with this approach but it appears to be
working...
==== src/tnet_transport_poll.c, around line 671 =====
if(!len){
TSK_DEBUG_WARN("IOCTLT returned '0' bytes for fd=%d", active_socket->fd);
/*FIXME: On Android/MAC OS X this mean that the socket has been closed? */
//#if defined(ANDROID) || 1
// TSK_RUNNABLE_ENQUEUE(transport, event_error, transport->callback_data,
active_socket->fd);
// removeSocket(i, context);
//#else
recv(active_socket->fd, 0, 0, 0);
//#endif
continue;
Original comment by rich.ho...@gmail.com
on 5 Feb 2011 at 8:13
[deleted comment]
[deleted comment]
One other concern you may want to consider adding. Since Doubango is polling
and pausing sockets, a thread optimization in the polling section to sleep
150ms (or less) while the socket is paused, saves battery, CPU and uneeded
debug trace lines. I used to see "Socket is paused" for up to 50 cycles waiting
for STUN, now I see "Socket is paused, sleeping for 150ms" just once:
================
/* check whether the socket is paused or not */
if(active_socket->paused){
TSK_DEBUG_INFO("Socket is paused, sleeping for 150ms");
tsk_thread_sleep(150);
goto TNET_POLLIN_DONE;
}
Original comment by rich.ho...@gmail.com
on 6 Feb 2011 at 12:55
You have provided a great feedback but I think there are many confusions.
We are not polling and pausing the sockets as you may think. The framework uses
poll() [http://linux.about.com/library/cmd/blcmdl2_poll.htm] function which is
blocking and works like select(). If you prefer using select() instead of
poll() then, set "TNET_HAVE_POLL" macro value to zero. Please see:
http://code.google.com/p/doubango/source/browse/trunk/tinyNET/src/tnet_poll.c
"Pausing" a socket doesn't mean what you think. A socket is only paused when
STUN is enabled.
Explication:
a) When a RTP socket is created then, we add it to the map managed by the main
transport thread (T1) using poll() or select().
b) When STUN is enabled then, a new NAT context is created to retrieve the
public ip:port. This context will create a new thread (T2) to send/receive and
parse STUN (or TURN) messages.
c) As T1 and T2 use the same socket then, both will receive notifications when
there is data to read. As you cannot read the socket two times then, we mark T1
as paused. In your case this means that you have received 50 RTP/STUN packets
before STUN process complete.
It's really not a good idea to add the sleep(150).
Original comment by boss...@yahoo.fr
on 6 Feb 2011 at 3:51
Original issue reported on code.google.com by
rich.ho...@gmail.com
on 4 Feb 2011 at 7:27