Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference

jlecour commented 9 years ago

Hi,

I've recently upgraded to 1.4.0 and I have some transient errors, like this one :

ElasticsearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference@143d90f]

It's very strange since it happens on various indices. Some of them are very stable (a handful of update each day), some of them are very volatile (thousands of concurrent read/writes during the day, then left alone).

I can't reproduce those errors. I can't pinpoint any specific query. I've been logging a lot and each time a query raise this error, I re-execute it and it runs well.

I've not tried to downgrade to 1.3.x yet. Maybe I will.

I've been raking my brain for many days, with quite some frustration hence this evasive bug report.

If anyone has the beginning of an idea, I'd be happy to try and give as much information as I can.

jlecour commented 9 years ago

A couple of details to add :

I get those errors with

Elasticsearch 1.4.0
Debian stable 7.7
OpenJDK 7u65-2.5.1-5~deb7u1
Ruby 2.1.5
elasticsearch gem 1.0.6

I don't get any error with

Elasticsearch 1.3.4
Ruby 2.1.2

There is also a couple of things that has changed in my application. I'll try to exclude those changes.

s1monw commented 9 years ago

can we get more of the stacktrace for this error - do you know when this happens?

jlecour commented 9 years ago

I've found this in my Elasticsearch log :

[2014-11-25 10:03:22,940][WARN ][http.netty               ] [Bis] Caught exception while handling client http traffic, closing connection [id: 0xf14bdac7, /127.0.0.1:54741 => /127.0.0.1:9200]
java.lang.IllegalArgumentException: empty text
        at org.elasticsearch.common.netty.handler.codec.http.HttpVersion.<init>(HttpVersion.java:97)
        at org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62)
        at org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75)
        at org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:191)
        at org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:102)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
        at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

PS : my log level is DEBUG and there is nothing before/after this message.

clintongormley commented 9 years ago

@jlecour I can reproduce this error if I telnet to port 9200 and send some non-HTTP message, eg `FOO\n``

Do you have some process which is pinging your server to check that it is alive?

jlecour commented 9 years ago

@clintongormley No I don't have such activity. Also, I have this error in my Ruby client's log, after _bulk requests. But as I've said, the request that trigger such an error can be further executed as is and work perfectly.

I may have more information in a few hours.

epackorigan commented 9 years ago

i was seeing this earlier today with a slightly malformed JSON block (trying to setup some transient settings). In my case, i had extra spaces at the beginning, before the opening bracket.

this worked:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.disk.threshold_enabled" : false
    }
}'

this didn't (note the extra space on the beginning of the line containing "transient")

curl -XPUT localhost:9200/_cluster/settings -d '
 { "transient" : {
        "cluster.routing.allocation.disk.threshold_enabled" : false
    }
}'

clintongormley commented 9 years ago

@epackorigan Can you recreate this?

i tried setting up two nodes and running your malfunctioning request, and all worked as expected.

epackorigan commented 9 years ago

I will try to reproduce on Monday and reconfirm.

epackorigan commented 9 years ago

not quite monday, but here is what i have:

I was trying to migrate data off one node, so i can service it (add ram to it). i tried the following:

curl --silent -XPUT localhost:9200/_cluster/settings -s '{
  "transient" : {
    "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
  }
}'

note: there are two spaces in front of transient, and 4 in front of cluster. i received the following error: "error" : "ElasticsearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@1]",

more variations on the spacing, or formatting of the submitted data, pretty much always yielded that same error:

curl --silent -XPUT localhost:9200/_cluster/settings?pretty -s '{ "transient" : { "cluster.routing.allocation.exclude._ip" : "10.0.0.1" } }'
{
  "error" : "ElasticsearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@1]",
  "status" : 400
}

until i had exactly 4/8 spaces:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
    }
}'
{"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"exclude":{"_ip":"10.0.0.1"}}}}}}

it could be that the parser there is very picky about the format of the JSON data (i'm not even sure if that's valid JSON or not).

Note: I'm running on Debian Wheezy, with OpenJDK-1.7, ES 1.4.2 (all installed from packages)

clintongormley commented 9 years ago

@epackorigan The formatting of your JSON is just fine. It's your use of parameters which is flakey ;)

Your version which doesn't work:

curl -XPUT localhost:9200/_cluster/settings -s '{

Your version which does work:

curl -XPUT localhost:9200/_cluster/settings -d '{

With the first version, you're not passing a body at all, which is why it is failing with Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@1

epackorigan commented 9 years ago

Doh! sorry for the noise!

mausch commented 9 years ago

I just got this error from copying and pasting a request from somewhere into Marvel/Sense, which left me with a space after the request path. I googled the error and this was the first result, which was a good thing because I could fix it right away, but perhaps a "less scary" error might help other users (I imagine this must be a quite common mistake).

bleskes commented 9 years ago

@mausch that's annoying. FYI We have this noted for Sense and we'll fix it to trim the request properly (putting aside the question of a better error message

bleskes commented 9 years ago

@mausch FYI - the trailing space issue is fixed in marvel 1.3.1 , released today.

clintongormley commented 9 years ago

No more info on this ticket. Closing. Feel free to reopen if you're still seeing this

GamingCoder commented 9 years ago

I saw this error in the ES JavaScript client and documented the problem at https://github.com/elastic/elasticsearch-js/issues/245

bamarni commented 8 years ago

I'm getting the same error with version 1.7.4 and the official php client.

It is with the bulk api too, even though I get this error it looks like everything worked normally, very similar to what @jlecour reported.

[EDIT] : actually it was due to an empty document in the batch, my bad.

ulkas commented 8 years ago

@bamarni the same at me, also en empty body in the batch json

serg3ant commented 7 years ago

It seem to happen when bulk is empty (no actions / documents at all)

gdubicki commented 6 years ago

Can you please reopen and change the error message?

Passing empty data to ES API seems to be a pretty common mistake...

(especially as curl makes is it easy to make this mistake - btw: I am switching to https://github.com/jakubroztocil/httpie to hitting such issues in the future)

javanna commented 6 years ago

@gdubicki the error messages were improved for all the relevant APIs with #23497 (went out with 5.5).

elastic / elasticsearch

Failed to derive xcontent from org.elasticsearch.common.bytes.ChannelBufferBytesReference #8595