gmaurice commented 11 years ago

Hello, i made a successful first backup with this call :

curl http://localhost:9200/_export -d '{ "fields":["_id", "_source", "_version", "_index", "_type"],"output_cmd": ["/home/elasticsearch/inout-output-cmd.sh", "test", "${cluster}-${index}-${shard}"] }'

/home/elasticsearch/inout-output-cmd.sh

!/bin/bash

cat - | ssh backup.srv "gzip -c | cat - > /data/elasticsearch-inout-dump/$1/$2.gz"

The first one works good, however, the second one throws the following memory exception just after launching it: {"index":"ad974b0001","shard":0,"reason":"BroadcastShardOperationFailedException[[ad974b0001][0] ]; nested: QueryPhaseExecutionException[[ad974b0001][0]: query[ConstantScore(:)],from[-1],size[-1]: Query Failed [failed to execute export]]; nested: ExportException[[ad974b0001][0]: query[ConstantScore(:)],from[-1],size[-1]: Export Failed [Failed to open output: ]]; nested: IOException[Cannot run program \"/home/elasticsearch/inout-output-cmd.sh\": java.io.IOException: error=12, Cannot allocate memory]; nested: IOException[java.io.IOException: error=12, Cannot allocate memory]; "}

Even i have this exception the _dump command works good.

The index named is the exception is very small, 12k docs and 5.6MB on disk using "_dump".

I hope you have an idea on how to fix it, thanks.

mfussenegger commented 11 years ago

Hi,

I am a little lost here:

The first one works good, however, the second one throws the following memory

which second one? I just see one curl call and from what I can tell that is the one that works.

gmaurice commented 11 years ago

The curl command is the same for the first and the second test. The second test throws the exception I reported.

mfussenegger commented 11 years ago

Ok got it. I think 'll need some more information. Can you post your java version and the output from top (sorted by memory) while elasticsearch is running (before executing the export and while/after executing the export).

Also. Can you try the export after you've set

echo 0 > /proc/sys/vm/overcommit_memory

thanks for the feedback so far.

gmaurice commented 11 years ago

elasticsearch 0.90.2
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

The max heap size is -Xmx25g.

Here is the top sorted by RES when i have the problem : (it doesn't differ if I try an export)

top - 20:07:09 up 127 days,  6:56,  4 users,  load average: 0.12, 0.20, 0.18
Tasks: 145 total,   1 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.4%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32842468k total, 32640904k used,   201564k free,   234904k buffers
Swap:  8189944k total,    68844k used,  8121100k free,  9191476k cached

  PID USER      PR  NI  VIRT  *RES  SHR S %CPU %MEM    TIME+  CODE COMMAND
20093 elastics  20   0 30.4g  21g 5756 S    1 68.3 252:22.44   36 java
 3502 bind      20   0  673m 7328 1860 S    0  0.0   0:06.60  480 named
16687 syslog    20   0  248m 5296  784 S    0  0.0   6:14.95  344 rsyslogd

However, if i restart elasticsearch, so in low memory fingerprint, it works.

I think you meant :

echo 1 > /proc/sys/vm/overcommit_memory

With overcommit enable, the _export succeeded to start when I write these words.

mfussenegger commented 11 years ago

It seems that the ProcessBuilder which we're using to execute the shell command uses fork() which tries to allocate the same amount of memory as the jvm has currently in use.

I'll discuss with my colleagues tomorrow what we can do about that and hopefully come up with a fix soon.

mfussenegger commented 11 years ago

Hi,

I've digged a bit into the issue and it seems that the bug I had in mind is fixed.

I've tested against

java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b09)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01, mixed mode)

and

java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.11.90) (rhel-1.62.1.11.11.90.el6_4-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Can you take the program here https://gist.github.com/mfussenegger/6120737 and run it on your machine and post the result? I can imagine that redhat backported some fixes into their "1.6.0_24" which are not available in ubuntus "1.6.0_27".

The program is also available as jar from https://www.dropbox.com/s/t6ey4k6aj42s9no/javatest-1.0-SNAPSHOT-jar-with-dependencies.jar

Run it with

java -d64 -Xmx4g -Xms4g -jar javatest-1.0-SNAPSHOT-jar-with-dependencies.jar

Thanks.

gmaurice commented 11 years ago

Here is the result of the run.

Free Memory: 3904
Used Memory: 20
to allocate: 200
allocated memory
Free Memory: 3104
Used Memory: 820
started process
Free Memory: with command: 3084
Used Memory with command: 840
stopped process
Free Memory: post command: 3084
Used Memory post command: 840
started process
Free Memory: with command: 3063
Used Memory with command: 861
stopped process
Free Memory: post command: 3063
Used Memory post command: 861
started process
Free Memory: with command: 3043
Used Memory with command: 881
stopped process
Free Memory: post command: 3043
Used Memory post command: 881
started process
Free Memory: with command: 3022
Used Memory with command: 902
stopped process
Free Memory: post command: 3022
Used Memory post command: 902
started process
Free Memory: with command: 3002
Used Memory with command: 922
stopped process
Free Memory: post command: 3002
Used Memory post command: 922
Free Memory: 3148302960
wait
Done

mfussenegger commented 11 years ago

So you have the same behaviour as I've seen in my tests. I had hoped that with your java version it's worse and that the behaviour is improved in more recent versions. But that doesn't seem to be the case.

Unfortunately we currently don't really have an idea how to work around the issue.

See http://bryanmarty.com/blog/2012/01/14/forking-jvm/ and http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run for descriptions of the problem.

The way the plugin works is that per shard a process for the export is started. That means if you have a lot of shards per node and your jvm has already consumed a large amount of memory the processes that will be spawned are rather large and therefore consume up the remaining memory.

I've tried to call System.gc() manually after the export is done so that the memory is freed sooner. While this helps I don't think it's a good solution.

Having to resort to Native Access solutions to utilize posix_spawn() would probably also minimize or solve the problem. But at the cost of making the plugin installation a whole lot more difficult. That's something we really want to avoid.

You could try to create pipe files (FIFO) and write to them using output_file instead of output_cmd and then read from a bash script that is started separately. But I haven't tested that approach.

Otherwise allowing overcommit is probably your best option.

gmaurice commented 11 years ago

Thanks for your answer.

crate / elasticsearch-inout-plugin

IOException[java.io.IOException: error=12, Cannot allocate memory] with output_cmd #33

!/bin/bash