cloudfoundry / jvmkill

Terminate the JVM when resources are exhausted
Apache License 2.0
30 stars 11 forks source link

On OOME: heap error, kill agent freezes #33

Open dmikusa opened 2 years ago

dmikusa commented 2 years ago

In some circumstances, if you fill up the heap and initiate an OOME: Heap error, you will see the memory histogram however the process won't exit. It'll hang at Memory usage:.

The test I used here was https://github.com/dmikusa-pivotal/java-memory-waster with a 1G memory limit & create 200,000 pieces of garbage & retain a reference to them. This happens on 1.16.0 & 1.17.0 (probably older as well). It happened on Java 8 & 11.

This is where we try to get some memory stats from the JVM:

        writeln_paced!(output, "\nMemory usage:");
        let get_memory_mxbean_method_id = jni_env.get_static_method_id(
            mf_class,
            "getMemoryMXBean",
            "()Ljava/lang/management/MemoryMXBean;",
        )?;
        let memory_mxbean =
            jni_env.call_static_object_method(mf_class, get_memory_mxbean_method_id)?;

It is possible that the JVM is not in a state where it can respond to that call, perhaps the heap is too full to make any progress.

At any rate, we should probably put a timeout on method calls after the JVM has gone into an OOME such that we skip metrics but still do everything we can to kill the process. The most important thing is that the process gets killed.