Closed GoogleCodeExporter closed 9 years ago
The CommunicationException is due to a timeout of the calling server, caused by
the receiving server taking too much time in processing the request.
You could try speeding up the request by assigning more memory to servers
and/or adding more servers to add computational power; also, you could raise
the timeout value at server startup, so that the calling server will wait for
more time.
This doesn't seem to be a server bug, so next time it would be better to
discuss on the mailing list prior to opening issue requests ;)
Thanks!
Original comment by sergio.b...@gmail.com
on 2 Aug 2011 at 3:56
Actually, there seems to be a bug in the timeout handling causing the NPE at
the bottom: should only be a minor bug, not affecting request handling, I'll
fix that in a moment :)
Original comment by sergio.b...@gmail.com
on 2 Aug 2011 at 3:59
Can you please open it back untill you have fixed the bug?
Original comment by rohi...@gmail.com
on 2 Aug 2011 at 7:03
>>>You could try speeding up the request by assigning more memory to servers
and/or adding more servers to add computational power; also, you could raise
the timeout value at server startup, so that the calling server will wait for
more time.
Are you talking about the
nodeTimeout (1000) : Timeout in milliseconds for node-to-node communication?
Will increasing this value cause the client to wait for a longer time?
Original comment by rohi...@gmail.com
on 2 Aug 2011 at 10:48
Bug is already fixed on trunk :)
Talking about the timeout, yes I'm referring to the node timeout startup
configuration: but it doesn't control the amount of time the *client* waits for
requests, it controls the amount of time a *server* waits for another server
response. That is, given a request R sent by your client C to server S1, if R
should be processed by server S2, S1 forwards request to S2 and then returns
the response to C; the node timeout controls how many milliseconds S1 waits for
S2 response; if S1 goes timeout, R is aborted, C gets a connection timeout and
S2 will eventually discard the response; please note that S2 may eventually
have processed the request and just discarded the response, as it happens with
any distributed system.
Hope that helps.
Original comment by sergio.b...@gmail.com
on 3 Aug 2011 at 9:06
I set the nodeTime to 100000 for all the servers, but I am still seeing this :
terrastore.communication.CommunicationException: Communication timeout!
at terrastore.communication.remote.RemoteNode.send(RemoteNode.java:153) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:248) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:240) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at terrastore.util.collect.parallel.ParallelUtils$1.call(ParallelUtils.java:53) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ~[na:1.6.0_26]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_26]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ~[na:1.6.0_26]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ~[na:1.6.0_26]
at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_26]
16:01:41.896Server9740 - New I/O client worker #3-1 -
terrastore.communication.remote.RemoteNode - null
java.lang.NullPointerException: null
at terrastore.communication.remote.RemoteNode$ClientHandler.signalCommandResponse(RemoteNode.java:244) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at terrastore.communication.remote.RemoteNode$ClientHandler.messageReceived(RemoteNode.java:228) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [netty-3.2.3.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
16:01:45.396Server9740 - New I/O client worker #2-1 -
terrastore.communication.remote.RemoteNode - null
java.lang.NullPointerException: null
at terrastore.communication.remote.RemoteNode$ClientHandler.signalCommandResponse(RemoteNode.java:244) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at terrastore.communication.remote.RemoteNode$ClientHandler.messageReceived(RemoteNode.java:228) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) ~[netty-3.2.3.Final.jar:na]
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [netty-3.2.3.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
Also, the range query retunrs a '500':'Connection Timeout' after nearly 20
seconds. Shouldn't it wait upto 100 seconds before returning a 500?
Will terrastorr throw a different error in case of communication failure? Will
setting --failoverRetries help?
Original comment by rohi...@gmail.com
on 3 Aug 2011 at 4:11
That's pretty odd, do you mind sending me the exact startup line and the exact
response from the server?
Original comment by sergio.b...@gmail.com
on 3 Aug 2011 at 4:16
Original issue reported on code.google.com by
rohi...@gmail.com
on 2 Aug 2011 at 3:38