ServiceResponseBuilder should be called asynchronously

GoogleCodeExporter commented 9 years ago

I am trying to call a service from within another service server. This leads to 
an IllegalStateException in netty (see below). I've attached a JUnit test that 
demonstrates the problem. Probably I am not using rosjava correctly, but I 
can't think of a valid way to call nested services. BTW, it would be very 
convenient to provide a blocking version of ServiceClient.call().

What steps will reproduce the problem?
1. run the attached JUnit test (testNestedServiceCall)

What is the expected output? What do you see instead?

30.01.2012 15:06:32 org.jboss.netty.channel.DefaultChannelPipeline
WARNUNG: An exception was thrown by a user handler while handling an exception 
event ([id: 0x0b035079, /127.0.0.1:56814 :> /127.0.0.1:42702] EXCEPTION: 
java.lang.IllegalStateException: await*() in I/O thread causes a dead lock or 
sudden performance drop. Use addListener() instead or call await*() from a 
different thread.)
org.ros.exception.RosRuntimeException: java.lang.IllegalStateException: 
await*() in I/O thread causes a dead lock or sudden performance drop. Use 
addListener() instead or call await*() from a different thread.
    at org.ros.internal.transport.ConnectionTrackingHandler.exceptionCaught(ConnectionTrackingHandler.java:73)
    at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
    at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:52)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
    at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalStateException: await*() in I/O thread causes a 
dead lock or sudden performance drop. Use addListener() instead or call 
await*() from a different thread.
    at org.jboss.netty.channel.DefaultChannelFuture.checkDeadLock(DefaultChannelFuture.java:296)
    at org.jboss.netty.channel.DefaultChannelFuture.awaitUninterruptibly(DefaultChannelFuture.java:208)
    at org.ros.internal.transport.tcp.TcpClientConnectionManager.connect(TcpClientConnectionManager.java:92)
    at org.ros.internal.node.service.DefaultServiceClient.connect(DefaultServiceClient.java:87)
    at org.ros.internal.node.service.ServiceFactory.newClient(ServiceFactory.java:131)
    at org.ros.internal.node.DefaultNode.newServiceClient(DefaultNode.java:279)
    at org.ros.internal.node.DefaultNode.newServiceClient(DefaultNode.java:285)
    at org.ros.node.service.ServiceIntegrationTest$2$1.build(ServiceIntegrationTest.java:130)
    at org.ros.node.service.ServiceIntegrationTest$2$1.build(ServiceIntegrationTest.java:1)
    at org.ros.internal.node.service.ServiceRequestHandler.handleRequest(ServiceRequestHandler.java:48)
    at org.ros.internal.node.service.ServiceRequestHandler.messageReceived(ServiceRequestHandler.java:58)
    ... 12 more

What version of the product are you using? On what operating system?

rosjava 902:e4b3af748de5, ROS Electric, Ubuntu 11.04 (natty)

Please provide any additional information below.

Original issue reported on code.google.com by martin.g...@gmail.com on 30 Jan 2012 at 2:27

Attachments:

ServiceServer.java

GoogleCodeExporter commented 9 years ago

Ooops, sorry, wrong attachment. This is the correct file.

Original comment by martin.g...@gmail.com on 30 Jan 2012 at 2:29

Attachments:

ServiceIntegrationTest.java

GoogleCodeExporter commented 9 years ago

You shouldn't block inside the callbacks. Instead, block in a separate thread. 
For example, you could supply the ResponseListener with a CountDownLatch and 
then call await() on the latch in your thread. The ResponseListener would call 
countDown() after the response has been received. At that point, your thread 
can continue. You could also use org.ros.Holder for this.

Original comment by damonkoh...@google.com on 31 Jan 2012 at 5:40

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

Thanks for your quick reply! I've implemented your suggestion, but I believe it 
doesn't solve the underlying problem, just hides the Exception.

To demonstrate this, I've updated the JUnit tests (see attachment). Now, there 
are three implementations which do the same thing:

1. testNestedServiceCallUnthreaded() -- a simplified version of my original 
test. serviceClient.call() is called directly from the callback thread. Fails 
with the "dead lock" Exception shown above.

2. testNestedServiceCallWaitOnThread() -- implementation of your suggestion. 
serviceClient.call() is called from a separate thread, then we use a 
CountDownLatch to wait() for it to finish. Doesn't throw the exception.

3. testNestedServiceCallJoinThread() -- serviceClient.call() is called from a 
separate thread, then the thread is immediately joined. Doesn't throw the 
exception.

The point is that ALL THREE IMPLEMENTATIONS BLOCK THE CALLBACK THREAD. In all 
three, wait() is called at one point or another:

impl. 1: the callback thread directly enters wait() in 
DefaultChannelFuture.awaitUninterruptibly. This potential deadlock is detected.

impl. 2: the callback thread enters wait() in CountDownLatch. This potential 
deadlock is hidden from Netty.

impl. 3: we don't even need to wait() for very long, because we immediately 
join the worker thread. This potential deadlock is also hidden from Netty.

To sum up, whatever we do, the callback thread is blocked until the nested 
service call returns. Whether we directly wait in the callback thread, or spawn 
an intermediate thread and then wait on that one, shouldn't make a difference 
deadlock-wise. And there is no way around it, because we need the result as a 
return value.

In my opinion, there are two ways to go about this:

- Either the danger of deadlock is real, and we're using Netty incorrectly. In 
that case, none of the three implementations solves the problem, just hides it.

- Or it can be guaranteed that this potential deadlock cannot happen here. Only 
in that case the exception should be turned off using 
DefaultChannelFuture.setUseDeadLockChecker().

I can't say which of the two is the case, but could you please reopen this 
ticket?

Original comment by martin.g...@gmail.com on 2 Feb 2012 at 11:33

Attachments:

ServiceIntegrationTest.java

GoogleCodeExporter commented 9 years ago

I was actually going to reopen this anyway as I came to the same conclusion 
myself. The problem is that the ServiceResponseBuilder has to be asynchronous 
as well. Doing that will require the larger fix that ensures service calls are 
answered in the order they're received. Right now there's a race condition that 
could cause responses to be returned for the wrong request.

Original comment by damonkoh...@google.com on 2 Feb 2012 at 3:50

Changed title: ServiceResponseBuilder should be called asynchronously
Changed state: Accepted

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 17d16f5b5b2e.

Original comment by damonkoh...@google.com on 5 Apr 2012 at 9:45

Changed state: Fixed

gok03 / rosjava

ServiceResponseBuilder should be called asynchronously #80