Closed GoogleCodeExporter closed 9 years ago
Ok the problem is wider than first anticipated. Looking into this it seems Red5
application and system deadlock can happen in numerous ways given the current
architecture... if someone can confirm I'd be grateful.
For example, if an application publishes a server method, say
ChatService.sendMessage, and the implementation makes a call on the connection
of another user. Should both users send each other messages at exactly the same
moment, you deadlock..
In fact it seems any system where input from user A can drive output to user B,
and input from user B can drive output to user A, can cause deadlock !(?)
Here are the calls where Red5 acquires a lock on the connection object that can
lead to deadlock:-
RTMPMinaProtocolDecoder.decode
RTMPMinaProtocolEncoder.encode
RTMPMinaIoHandler.rawBufferReceived
RTMPConnection.open, RTMPConnection.close
BaseConnection.connect, BaseConnection.close
And to be clear, this is not a hypothetical problem. We now understand this to
be behind unresolved system issues plaguing us in production this year.
If the analysis is correct, the entire Red5 locking and concurrency model needs
to be urgently overhauled.
I believe this is likely the root cause of a majority of mysterious lockups and
leaks reported (leaks because Red5 often starts leaking sockets at an
incredible rate once connection threads deadlock with each other)
This is currently an urgent threat facing all Red5 installations. Fixing it
will take Red5 forward a huge leap, but it is currently a critical flaw.
Original comment by dwilli...@system7.co.uk
on 15 Dec 2011 at 2:06
Original comment by mondain
on 15 Dec 2011 at 5:21
Original comment by mondain
on 15 Dec 2011 at 9:36
Probably worth mentioning: a way around this is to process RMI calls that might
cause output to another connected client asynchronously. That is, you receive
the parameter input, and pass that to a runnable operation object passed to an
execution pool. In our project we have an asynchronous RMI system that sits on
top of the standard Red5 for other reasons (for example, so call results can be
returned as they are calculated, rather than invocation order) and for this
reason we did not hit this deadlock bug too often. However, certain calls did
not run through the asynchronous system for speed purposes, in particular
instant chat messages, as per the example above, which we have now pushed to
our asynchronous mechanism. It is imperative that if your RMI calls can,
through whatever call chain, cause output to another connection, this is moved
to such a mechanism or you risk deadlock.
Original comment by dwilli...@system7.co.uk
on 18 Feb 2012 at 11:28
I have reworked the thread and mina protocol handling in revision 4325. I
believe the issues noted here should now be resolved.
Original comment by mondain
on 26 Apr 2012 at 8:21
Thank you very much for working on this.
Original comment by m...@paradisesoftware.net
on 26 Apr 2012 at 10:13
Sounds good. Really looking forward to checking out!
Original comment by dwilli...@system7.co.uk
on 27 Apr 2012 at 10:43
Original issue reported on code.google.com by
dwilli...@system7.co.uk
on 15 Dec 2011 at 12:40