byzhang / terrastore

Automatically exported from code.google.com/p/terrastore
Other
0 stars 0 forks source link

Connection Timeout Exception #180

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi

When I do a query to for prefix matching, I am either getting a Java-Heap error 
or Connection timeout error.

15:31:52.975Server9540 - Thread-99 - 
terrastore.service.impl.DefaultQueryService - Communication timeout!
terrastore.communication.CommunicationException: Communication timeout!
        at terrastore.communication.remote.RemoteNode.send(RemoteNode.java:153) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$13.map(DefaultQueryService.java:388) [terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$13.map(DefaultQueryService.java:380) [terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.util.collect.parallel.ParallelUtils$1.call(ParallelUtils.java:53) [terrastore-0.8.2-SNAPSHOT.jar:na]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_26]
        at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
15:32:03.064Server9540 - 28933892@qtp-16963619-68 - 
terrastore.server.impl.CoreServer - Communication timeout!
terrastore.communication.CommunicationException: Communication timeout!
        at terrastore.communication.remote.RemoteNode.send(RemoteNode.java:153) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:248) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:240) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.util.collect.parallel.ParallelUtils$1.call(ParallelUtils.java:53) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ~[na:1.6.0_26]
        at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ~[na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ~[na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_26]
15:32:16.442Server9540 - New I/O client worker #59-1 - 
terrastore.communication.remote.RemoteNode - null
java.lang.NullPointerException: null
        at terrastore.communication.remote.RemoteNode$ClientHandler.signalCommandResponse(RemoteNode.java:244) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.communication.remote.RemoteNode$ClientHandler.messageReceived(RemoteNode.java:228) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [netty-3.2.3.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]

Original issue reported on code.google.com by rohi...@gmail.com on 2 Aug 2011 at 3:38

GoogleCodeExporter commented 9 years ago
The CommunicationException is due to a timeout of the calling server, caused by 
the receiving server taking too much time in processing the request.
You could try speeding up the request by assigning more memory to servers 
and/or adding more servers to add computational power; also, you could raise 
the timeout value at server startup, so that the calling server will wait for 
more time.

This doesn't seem to be a server bug, so next time it would be better to 
discuss on the mailing list prior to opening issue requests ;)

Thanks!

Original comment by sergio.b...@gmail.com on 2 Aug 2011 at 3:56

GoogleCodeExporter commented 9 years ago
Actually, there seems to be a bug in the timeout handling causing the NPE at 
the bottom: should only be a minor bug, not affecting request handling, I'll 
fix that in a moment :)

Original comment by sergio.b...@gmail.com on 2 Aug 2011 at 3:59

GoogleCodeExporter commented 9 years ago
Can you please open it back untill you have fixed the bug?

Original comment by rohi...@gmail.com on 2 Aug 2011 at 7:03

GoogleCodeExporter commented 9 years ago
>>>You could try speeding up the request by assigning more memory to servers 
and/or adding more servers to add computational power; also, you could raise 
the timeout value at server startup, so that the calling server will wait for 
more time.

Are you talking about the 
nodeTimeout (1000) : Timeout in milliseconds for node-to-node communication?

Will increasing this value cause the client to wait for a longer time?

Original comment by rohi...@gmail.com on 2 Aug 2011 at 10:48

GoogleCodeExporter commented 9 years ago
Bug is already fixed on trunk :)

Talking about the timeout, yes I'm referring to the node timeout startup 
configuration: but it doesn't control the amount of time the *client* waits for 
requests, it controls the amount of time a *server* waits for another server 
response. That is, given a request R sent by your client C to server S1, if R 
should be processed by server S2, S1 forwards request to S2 and then returns 
the response to C; the node timeout controls how many milliseconds S1 waits for 
S2 response; if S1 goes timeout, R is aborted, C gets a connection timeout and 
S2 will eventually discard the response; please note that S2 may eventually 
have processed the request and just discarded the response, as it happens with 
any distributed system.

Hope that helps.

Original comment by sergio.b...@gmail.com on 3 Aug 2011 at 9:06

GoogleCodeExporter commented 9 years ago
I set the nodeTime to 100000 for all the servers, but I am still seeing this : 

terrastore.communication.CommunicationException: Communication timeout!
        at terrastore.communication.remote.RemoteNode.send(RemoteNode.java:153) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:248) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.service.impl.DefaultQueryService$7.map(DefaultQueryService.java:240) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.util.collect.parallel.ParallelUtils$1.call(ParallelUtils.java:53) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ~[na:1.6.0_26]
        at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ~[na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ~[na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_26]
16:01:41.896Server9740 - New I/O client worker #3-1 - 
terrastore.communication.remote.RemoteNode - null
java.lang.NullPointerException: null
        at terrastore.communication.remote.RemoteNode$ClientHandler.signalCommandResponse(RemoteNode.java:244) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.communication.remote.RemoteNode$ClientHandler.messageReceived(RemoteNode.java:228) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [netty-3.2.3.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
16:01:45.396Server9740 - New I/O client worker #2-1 - 
terrastore.communication.remote.RemoteNode - null
java.lang.NullPointerException: null
        at terrastore.communication.remote.RemoteNode$ClientHandler.signalCommandResponse(RemoteNode.java:244) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at terrastore.communication.remote.RemoteNode$ClientHandler.messageReceived(RemoteNode.java:228) ~[terrastore-0.8.2-SNAPSHOT.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) ~[netty-3.2.3.Final.jar:na]
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [netty-3.2.3.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]

Also, the range query retunrs a '500':'Connection Timeout' after nearly 20 
seconds. Shouldn't it wait upto 100 seconds before returning a 500?

Will terrastorr throw a different error in case of communication failure? Will 
setting --failoverRetries  help?

Original comment by rohi...@gmail.com on 3 Aug 2011 at 4:11

GoogleCodeExporter commented 9 years ago
That's pretty odd, do you mind sending me the exact startup line and the exact 
response from the server?

Original comment by sergio.b...@gmail.com on 3 Aug 2011 at 4:16