Open gmaurice opened 11 years ago
Hi,
I am a little lost here:
The first one works good, however, the second one throws the following memory
which second one? I just see one curl call and from what I can tell that is the one that works.
The curl command is the same for the first and the second test. The second test throws the exception I reported.
Ok got it. I think 'll need some more information. Can you post your java version and the output from top (sorted by memory) while elasticsearch is running (before executing the export and while/after executing the export).
Also. Can you try the export after you've set
echo 0 > /proc/sys/vm/overcommit_memory
thanks for the feedback so far.
elasticsearch 0.90.2
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
The max heap size is -Xmx25g.
Here is the top sorted by RES when i have the problem : (it doesn't differ if I try an export)
top - 20:07:09 up 127 days, 6:56, 4 users, load average: 0.12, 0.20, 0.18
Tasks: 145 total, 1 running, 144 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.4%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32842468k total, 32640904k used, 201564k free, 234904k buffers
Swap: 8189944k total, 68844k used, 8121100k free, 9191476k cached
PID USER PR NI VIRT *RES SHR S %CPU %MEM TIME+ CODE COMMAND
20093 elastics 20 0 30.4g 21g 5756 S 1 68.3 252:22.44 36 java
3502 bind 20 0 673m 7328 1860 S 0 0.0 0:06.60 480 named
16687 syslog 20 0 248m 5296 784 S 0 0.0 6:14.95 344 rsyslogd
However, if i restart elasticsearch, so in low memory fingerprint, it works.
I think you meant :
echo 1 > /proc/sys/vm/overcommit_memory
With overcommit enable, the _export succeeded to start when I write these words.
It seems that the ProcessBuilder which we're using to execute the shell command uses fork() which tries to allocate the same amount of memory as the jvm has currently in use.
I'll discuss with my colleagues tomorrow what we can do about that and hopefully come up with a fix soon.
Hi,
I've digged a bit into the issue and it seems that the bug I had in mind is fixed.
I've tested against
java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b09)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01, mixed mode)
and
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.11.90) (rhel-1.62.1.11.11.90.el6_4-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Can you take the program here https://gist.github.com/mfussenegger/6120737 and run it on your machine and post the result? I can imagine that redhat backported some fixes into their "1.6.0_24" which are not available in ubuntus "1.6.0_27".
The program is also available as jar from https://www.dropbox.com/s/t6ey4k6aj42s9no/javatest-1.0-SNAPSHOT-jar-with-dependencies.jar
Run it with
java -d64 -Xmx4g -Xms4g -jar javatest-1.0-SNAPSHOT-jar-with-dependencies.jar
Thanks.
Here is the result of the run.
Free Memory: 3904
Used Memory: 20
to allocate: 200
allocated memory
Free Memory: 3104
Used Memory: 820
started process
Free Memory: with command: 3084
Used Memory with command: 840
stopped process
Free Memory: post command: 3084
Used Memory post command: 840
started process
Free Memory: with command: 3063
Used Memory with command: 861
stopped process
Free Memory: post command: 3063
Used Memory post command: 861
started process
Free Memory: with command: 3043
Used Memory with command: 881
stopped process
Free Memory: post command: 3043
Used Memory post command: 881
started process
Free Memory: with command: 3022
Used Memory with command: 902
stopped process
Free Memory: post command: 3022
Used Memory post command: 902
started process
Free Memory: with command: 3002
Used Memory with command: 922
stopped process
Free Memory: post command: 3002
Used Memory post command: 922
Free Memory: 3148302960
wait
Done
So you have the same behaviour as I've seen in my tests. I had hoped that with your java version it's worse and that the behaviour is improved in more recent versions. But that doesn't seem to be the case.
Unfortunately we currently don't really have an idea how to work around the issue.
See http://bryanmarty.com/blog/2012/01/14/forking-jvm/ and http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run for descriptions of the problem.
The way the plugin works is that per shard a process for the export is started. That means if you have a lot of shards per node and your jvm has already consumed a large amount of memory the processes that will be spawned are rather large and therefore consume up the remaining memory.
I've tried to call System.gc()
manually after the export is done so that the
memory is freed sooner. While this helps I don't think it's a good solution.
Having to resort to Native Access solutions to utilize posix_spawn()
would
probably also minimize or solve the problem. But at the cost of making the
plugin installation a whole lot more difficult. That's something we really want
to avoid.
You could try to create pipe files (FIFO) and write to them using output_file
instead of output_cmd
and then read from a bash script that is started
separately. But I haven't tested that approach.
Otherwise allowing overcommit is probably your best option.
Thanks for your answer.
Hello, i made a successful first backup with this call :
curl http://localhost:9200/_export -d '{ "fields":["_id", "_source", "_version", "_index", "_type"],"output_cmd": ["/home/elasticsearch/inout-output-cmd.sh", "test", "${cluster}-${index}-${shard}"] }'
/home/elasticsearch/inout-output-cmd.sh
!/bin/bash
cat - | ssh backup.srv "gzip -c | cat - > /data/elasticsearch-inout-dump/$1/$2.gz"
The first one works good, however, the second one throws the following memory exception just after launching it: {"index":"ad974b0001","shard":0,"reason":"BroadcastShardOperationFailedException[[ad974b0001][0] ]; nested: QueryPhaseExecutionException[[ad974b0001][0]: query[ConstantScore(:)],from[-1],size[-1]: Query Failed [failed to execute export]]; nested: ExportException[[ad974b0001][0]: query[ConstantScore(:)],from[-1],size[-1]: Export Failed [Failed to open output: ]]; nested: IOException[Cannot run program \"/home/elasticsearch/inout-output-cmd.sh\": java.io.IOException: error=12, Cannot allocate memory]; nested: IOException[java.io.IOException: error=12, Cannot allocate memory]; "}
Even i have this exception the _dump command works good.
The index named is the exception is very small, 12k docs and 5.6MB on disk using "_dump".
I hope you have an idea on how to fix it, thanks.