elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
70.93k stars 24.9k forks source link

-XX:OnOutOfMemoryError is broken by Elasticsearch #18736

Closed samcday closed 8 years ago

samcday commented 8 years ago

Elasticsearch version: 2.3.3

JVM version: any

OS version: BSD / Linux

Description of the problem including expected versus actual behavior:

13753 introduced seccomp stuff, which uses kernel voodoo that I don't really understand (like, at all) to drop permissions to exec/fork/execve/etc syscalls. Unfortunately, the side effect of this is -XX:OnOutOfMemoryError is now utterly broken.

Steps to reproduce:

  1. Install my production-ready, webscale OOM plugin
  2. Run JAVA_OPTS="-XX:OnOutOfMemoryError=pwd" bin/elasticsearch
  3. curl localhost:9200/_cat/oom

With the above steps, you'll see some output like this:

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="pwd"
#   Executing "pwd"...

But the command doesn't actually run. On Linux, that's the end of it. If you run it in OSX, you'll actually get a more insightful message immediately after the above output:

os::fork_and_exec failed: Resource temporarily unavailable (35)

If you then follow the same steps above, but instead run JAVA_OPTS=-Des.bootstrap.seccomp=false bin/elasticsearch, you'll see that the OOM handler works properly.

s1monw commented 8 years ago

this works as designed. the only think i can think of is that we should maybe add this to the documentation OR fail to start up if seccomp is enabled and -XX:OnOutOfMemoryErroris set too.

jasontedor commented 8 years ago

You should just upgrade to 8u92 and use the new flag ExitOnOutOfMemoryError, it does not need to fork and is compatible with seccomp.

Applying this patch to Elasticsearch:

diff --git a/core/src/main/java/org/elasticsearch/node/Node.java b/core/src/main/java/org/elasticsearch/node/Node.java
index cf33770..ecd4c47 100644
--- a/core/src/main/java/org/elasticsearch/node/Node.java
+++ b/core/src/main/java/org/elasticsearch/node/Node.java
@@ -261,6 +261,12 @@ public class Node implements Closeable {
             }
         }

+        try {
+            int[] a = new int[16777216];
+        } catch (OutOfMemoryError e) {
+            // intentional so we do not otherwise die
+        }
+
         logger.info("initialized");
     }

so that we are attempting to allocate a 64 MB array after seccomp is installed:

09:53:58 ⏚ [jason:~/elasticsearch/elasticsearch-5.0.0-SNAPSHOT] $ ES_JAVA_OPTS="-Xms64m -Xmx64m -XX:+ExitOnOutOfMemoryError" ./bin/elasticsearch
[2016-06-04 09:54:08,358][INFO ][node                     ] [Cassandra Nova] version[5.0.0-SNAPSHOT], pid[89211], build[2d57bbd/2016-06-04T12:30:20.088Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_92/25.92-b14]
[2016-06-04 09:54:08,359][INFO ][node                     ] [Cassandra Nova] initializing ...
[2016-06-04 09:54:08,973][INFO ][plugins                  ] [Cassandra Nova] modules [percolator, lang-mustache, lang-painless, ingest-grok, reindex, aggs-matrix-stats, lang-expression, lang-groovy], plugins []
[2016-06-04 09:54:08,995][INFO ][env                      ] [Cassandra Nova] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [367.7gb], net total_space [464.7gb], spins? [unknown], types [hfs]
[2016-06-04 09:54:08,995][INFO ][env                      ] [Cassandra Nova] heap size [61.8mb], compressed ordinary object pointers [true]
[2016-06-04 09:54:10,355][INFO ][node                     ] [Cassandra Nova] initialized
[2016-06-04 09:54:10,355][INFO ][node                     ] [Cassandra Nova] starting ...
[2016-06-04 09:54:10,410][INFO ][transport                ] [Cassandra Nova] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2016-06-04 09:54:10,412][WARN ][bootstrap                ] [Cassandra Nova] please set [discovery.zen.minimum_master_nodes] to a majority of the number of master eligible nodes in your cluster
[2016-06-04 09:54:13,456][INFO ][cluster.service          ] [Cassandra Nova] new_master {Cassandra Nova}{d8Gh-txJRzK0EBO-lZuHqA}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-06-04 09:54:13,473][INFO ][http                     ] [Cassandra Nova] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2016-06-04 09:54:13,473][INFO ][node                     ] [Cassandra Nova] started
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid89211.hprof ...
Heap dump file created [25956023 bytes in 0.163 secs]
Terminating due to java.lang.OutOfMemoryError: Java heap space
09:54:13 ⏚ [jason:~/elasticsearch/elasticsearch-5.0.0-SNAPSHOT] 3 $ 

and without the flag:

09:54:13 ⏚ [jason:~/elasticsearch/elasticsearch-5.0.0-SNAPSHOT] 3 $ ES_JAVA_OPTS="-Xms64m -Xmx64m" ./bin/elasticsearch
[2016-06-04 09:57:41,286][INFO ][node                     ] [Nathaniel Richards] version[5.0.0-SNAPSHOT], pid[89285], build[2d57bbd/2016-06-04T12:30:20.088Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_92/25.92-b14]
[2016-06-04 09:57:41,287][INFO ][node                     ] [Nathaniel Richards] initializing ...
[2016-06-04 09:57:41,792][INFO ][plugins                  ] [Nathaniel Richards] modules [percolator, lang-mustache, lang-painless, ingest-grok, reindex, aggs-matrix-stats, lang-expression, lang-groovy], plugins []
[2016-06-04 09:57:41,810][INFO ][env                      ] [Nathaniel Richards] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [367.7gb], net total_space [464.7gb], spins? [unknown], types [hfs]
[2016-06-04 09:57:41,810][INFO ][env                      ] [Nathaniel Richards] heap size [61.8mb], compressed ordinary object pointers [true]
[2016-06-04 09:57:43,251][INFO ][node                     ] [Nathaniel Richards] initialized
[2016-06-04 09:57:43,252][INFO ][node                     ] [Nathaniel Richards] starting ...
[2016-06-04 09:57:43,328][INFO ][transport                ] [Nathaniel Richards] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2016-06-04 09:57:43,331][WARN ][bootstrap                ] [Nathaniel Richards] please set [discovery.zen.minimum_master_nodes] to a majority of the number of master eligible nodes in your cluster
[2016-06-04 09:57:46,385][INFO ][cluster.service          ] [Nathaniel Richards] new_master {Nathaniel Richards}{lyd84Tf9R4GficBHI3-gWg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-06-04 09:57:46,402][INFO ][http                     ] [Nathaniel Richards] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2016-06-04 09:57:46,402][INFO ][node                     ] [Nathaniel Richards] started
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid89285.hprof ...
Heap dump file created [26087334 bytes in 0.147 secs]
[2016-06-04 09:57:46,665][INFO ][node                     ] [Nathaniel Richards] error
java.lang.OutOfMemoryError: Java heap space
        at org.elasticsearch.node.Node.start(Node.java:407)
        at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:197)
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:256)
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:96)
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:91)
        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:91)
        at org.elasticsearch.cli.Command.main(Command.java:53)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:70)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:63)
[2016-06-04 09:57:46,667][INFO ][gateway                  ] [Nathaniel Richards] recovered [0] indices into cluster_state

If you need to do something other than exit on out of memory error then echoing @rmuir and @s1monw, sorry, we are not going to support that.

samcday commented 8 years ago

Awesome, I did not know about the -XX:ExitOnOutOfMemoryError. We wanted a script to fire so that we could raise a Datadog event for the OOM (then we can easily setup alarms on this occurrence).

FWIW, I don't think it's reasonable to close this issue just yet. I wasted a solid 3 hours trying to figure out why my OOM handler wasn't working. There was zero documentation on this apparently known issue. Can we re-open this and close once there's documentation indicating that -XX:OnOutOfMemoryError is intentionally and permanently broken, and -XX:ExitOnOutOfMemoryError is the only option?

samcday commented 8 years ago

@s1monw thoughts?

jasontedor commented 8 years ago

I opened #18756.