byzhang / terrastore

Automatically exported from code.google.com/p/terrastore
Other
0 stars 0 forks source link

NullPointerException on Terrastore #171

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am presently encountering a NullPointerException when attempting to do any 
predicate search operations on a specific bucket (the issue does not occur on 
other buckets).  The following are the top several frames from a stack trace.  
I am attaching the full stack trace.

Terrastore Server 0.8.1 - 16:35:07.931 - null
java.lang.NullPointerException: null
at org.msgpack.Packer.packString(Packer.java:335) ~[msgpack-0.4.3-devel.jar:na]
at terrastore.util.io.MsgPackUtils.packString(MsgPackUtils.java:60) 
~[terrastore-0.8.1.jar:na]
at terrastore.common.ErrorMessage.messagePack(ErrorMessage.java:67) 
~[terrastore-0.8.1.jar:na]
at org.msgpack.Packer.pack(Packer.java:447) ~[msgpack-0.4.3-devel.jar:na]
at terrastore.util.io.MsgPackUtils.packErrorMessage(MsgPackUtils.java:81) 
~[terrastore-0.8.1.jar:na]
at 
terrastore.communication.protocol.AbstractResponse.messagePack(AbstractResponse.
java:62) ~[terrastore-0.8.1.jar:na]
at org.msgpack.template.ClassTemplate.pack(ClassTemplate.java:53) 
~[msgpack-0.4.3-devel.jar:na]
at org.msgpack.template.AnyTemplate.pack(AnyTemplate.java:30) 
~[msgpack-0.4.3-devel.jar:na]
at org.msgpack.Packer.pack(Packer.java:452) ~[msgpack-0.4.3-devel.jar:na]
at terrastore.util.io.MsgPackSerializer.doSerialize(MsgPackSerializer.java:79) 
[terrastore-0.8.1.jar:na]
at terrastore.util.io.MsgPackSerializer.serialize(MsgPackSerializer.java:52) 
[terrastore-0.8.1.jar:na]
at 
terrastore.communication.remote.SerializerEncoder.encode(SerializerEncoder.java:
38) [terrastore-0.8.1.jar:na]
at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEn
coder.java:66) [netty-3.2.2.Final.jar:na]
at 
org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeli
ne.java:385) [netty-3.2.2.Final.jar:na]
at 
org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeli
ne.java:380) [netty-3.2.2.Final.jar:na]
at org.jboss.netty.channel.Channels.write(Channels.java:611) 
[netty-3.2.2.Final.jar:na]

Original issue reported on code.google.com by teonanac...@gmail.com on 19 Mar 2011 at 11:53

Attachments:

GoogleCodeExporter commented 9 years ago
Do you get the same exception with the Java Client or Curl too?
Which bucket and predicate are you using?

Original comment by sergio.b...@gmail.com on 20 Mar 2011 at 9:47

GoogleCodeExporter commented 9 years ago
Sorry, I've already reset the bucket to continue getting work done (the issue 
was preventing my application from working correctly).

I was using the js predicate but didn't try jxpath or any others.  I should 
point out that perhaps the issue wasn't so much the use of a predicate as an 
issue fetching a specific document.  This occurred during a programmatic 
bulk-load operation, where rapid calls to PUT and GET were running in sequence.

However, fetching all documents -was- working.

If it's useful, the following trace is from the second Terrastore node a few 
seconds later:

Terrastore Server 0.8.1 - 16:35:17.930 - Communication timeout!
terrastore.communication.CommunicationException: Communication timeout!
at 
terrastore.communication.remote.RemoteNode.send(RemoteNode.java:153)~[terrastore
-0.8.1.jar:na]
at 
terrastore.service.impl.DefaultQueryService$5.map(DefaultQueryService.java:204) 
~[terrastore-0.8.1.jar:na]
at 
terrastore.service.impl.DefaultQueryService$5.map(DefaultQueryService.java:196) 
~[terrastore-0.8.1.jar:na]
at terrastore.util.collect.parallel.ParallelUtils$1.call(ParallelUtils.java:53) 
~[terrastore-0.8.1.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
~[na:1.6.0_23]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_23]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86) ~[na:1.6.0_23]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
~[na:1.6.0_23]
at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_23]

Original comment by teonanac...@gmail.com on 20 Mar 2011 at 5:04

GoogleCodeExporter commented 9 years ago
I've made some investigations: to make a long story short, the exception you 
see there is caused by a null error message, so a minor problem; the real 
problem is *what* caused that exception, but I don't have enough information 
there to determine that.

So, if you can't provide a way to reproduce the problem, the best I can do is 
to improve the exception logging so that the next time we will see the original 
exception.

Original comment by sergio.b...@gmail.com on 21 Mar 2011 at 10:11

GoogleCodeExporter commented 9 years ago
I've not yet seen it come up again since.  I'll update to the new version soon 
and aim to be able to get you something more useful if/when it occurs next.

Original comment by teonanac...@gmail.com on 21 Mar 2011 at 6:10

GoogleCodeExporter commented 9 years ago
Fixed ErrorMessage to avoid failing with NPE on null message.

Original comment by sergio.b...@gmail.com on 26 Mar 2011 at 3:23

GoogleCodeExporter commented 9 years ago
I think I have uncovered the root of the problem--or if not this problem 
specifically, then a related problem.  It appears that calls to 
bucket(name).clear() do not take effect immediately, and it's probably my fault 
for assuming they do.

Namely, if I call:

client.bucket("test").clear()
client.bucket("test").key("1").get(...);

I may in fact still retrieve 1.  Alternatively, if I attempt to put "1" with an 
"if:absent" predicate, that predicate may fail because key "1" may still exist 
immediately following a clear() call.

In my usage, the situation is more likely to occur if the server has been 
moderately busy with a rapid-fire sequence of puts and gets.  (In my scenario, 
it's a bulk-load script that more or less resets my data store to a default 
state for development and testing.)

For the time being, I've simply added some pauses in the script after each 
clear() method.

Original comment by teonanac...@gmail.com on 23 Apr 2011 at 12:15

GoogleCodeExporter commented 9 years ago
Yes, the "clear" operation isn't synchronous, meaning it takes time to 
propagate in the cluster, and is a rather heavy one.

Unfortunately, as of now I don't have a solution for your problem: can you 
tolerate it?

Thanks for the feedback!

Sergio B.

Original comment by sergio.b...@gmail.com on 26 Apr 2011 at 10:12