eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

-XX:+ExitOnOutOfMemoryError not exiting #8552

Closed theresa-m closed 4 years ago

theresa-m commented 4 years ago

Java -version output

OpenJ9 release 0.18

openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.18.0, JRE 11 Linux amd64-64-Bit Compressed References 20200116_433 (JIT enabled, AOT enabled) OpenJ9 - 6968c18d7 OMR - 7a1b0239a JCL - 104cab2452 based on jdk-11.0.6+10)

Edit: can be replicated on Java 8 as well

Summary of problem

Details from the OpenJ9 slack channel:

Patrick SooHoo 4 days ago Does 0.18 fully support -XX:+ExitOnOutOfMemoryError? We turned it on, and seeing some cases where the VM reports “JVMDUMP046I VM is shutting down. Reason: java/lang/OutOfMemoryError”, but the process doesn’t seem to exit.

theresa.mammarella 4 days ago Hi Patrick, yes it should be working in 0.18. Would you create an issue with more information about your JVM version and environment? I can take a look at what is going wrong.

theresa.mammarella 4 days ago https://github.com/eclipse/openj9/issues (edited)

Patrick SooHoo 4 days ago Thanks @theresa.mammarella, is there anything specific you want me to try to capture? From java -version : openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.18.0, JRE 11 Linux amd64-64-Bit Compressed References 20200116_433 (JIT enabled, AOT enabled) OpenJ9 - 6968c18d7 OMR - 7a1b0239a JCL - 104cab2452 based on jdk-11.0.6+10)

theresa.mammarella 4 days ago thanks I will try to reproduce it. are you also running in a docker image? or using any other options in combination with this that may affect the out of memory behavior? any other information about your specific failure that can help me reproduce it may be useful as well. (edited)

Patrick SooHoo 18 hours ago Hi @theresa.mammarella, yes, this is running in a docker image, inside a kubernetes pod. We do also have a java dump trigger on OOME, and disable HeapDumpOnOutOfMemoryError: -XX:+ExitOnOutOfMemoryError -Xdump:java:file=/STDOUT/ -XX:-HeapDumpOnOutOfMemoryError -Xdump:java:events=throw,filter=java/lang/OutOfMemoryError,file=/STDOUT/

theresa-m commented 4 years ago

So far I haven't been able to reproduce this issue.

Things I have tried:

Test program:

class Hello {
        public static void main(String[] args) throws Throwable {
                Thread t1 = new Thread(new Hello().new RunnableImpl());
                t1.start();

                /* program continues forever */
                while(true) {
                        Thread.sleep(2000);
                        System.out.println("continue program forever");
                }
        }

        private class RunnableImpl implements Runnable {
                public void run()
                {
                        System.out.println("run thread");
                        /* test 1: throw system oom */
                        //long[] mer = new long[Integer.MAX_VALUE];

                        /* test 2: throw java oom */
                        throw new OutOfMemoryError();
                }
        }
}
psoohoo-turbonomic commented 4 years ago

Hi @theresa-m, thanks so much for investigating this.

I was trying to reproduce the issue in one of our environments to try to get more information to add to this issue. I modified the application to trigger an OutOfMemoryError on demand, and noticed something unusual when I was using it:

JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2020/02/12 18:13:16 - please wait.
JVMDUMP046I VM is shutting down. Reason: java/lang/OutOfMemoryError
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2020/02/12 18:13:18 - please wait.
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2020/02/12 18:13:18 - please wait.

I'm also attaching an strace from the process -- I hope there is something useful in there. You'll see the "VM is shutting down" and "Processing dump event" messages coming from different threads.

strace.zip

Herr-Sepp commented 4 years ago

If I let the OOME happen "naturally" (by lowering the available heap to a value where it happens on it's own quickly), then the JVM process reports handling the event multiple times, reports that the VM is shutting down, but the process never exits.

I can confirm that behavior. My log output:

app-service_1   | JVMDUMP039I Processing dump event "systhrow", 
detail "java/lang/OutOfMemoryError" at 2020/02/12 20:46:54 - please wait.

app-service_1   | JVMDUMP039I Processing dump event "systhrow", 
detail "java/lang/OutOfMemoryError" at 2020/02/12 20:46:54 - please wait.

app-service_1   | JVMDUMP039I Processing dump event "systhrow", 
detail "java/lang/OutOfMemoryError" at 2020/02/12 20:46:54 - please wait. 

app-service_1   | JVMDUMP039I Processing dump event "systhrow", 
detail "java/lang/OutOfMemoryError" at 2020/02/12 20:46:54 - please wait.

app-service_1   | JVMDUMP046I VM is shutting down. Reason: java/lang/OutOfMemoryError

app-service_1   | JVMDUMP039I Processing dump event "systhrow", 
detail "java/lang/OutOfMemoryError" at 2020/02/12 20:46:54 - please wait.

The process does not exit and the container is running. i was using

-XX:+UseContainerSupport -Xmx$HEAPSIZE -XX:-HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError

theresa-m commented 4 years ago

Thanks both of you! I was finally able to reproduce what you are seeing by further limiting the memory and adding some additional threads into the mix. I'll continue working on a solution.

edit: steps I took to reproduce:

java -Xmx2m -Xgcpolicy:nogc -XX:+ExitOnOutOfMemoryError -XX:-HeapDumpOnOutOfMemoryError Hello
class Hello {
        public static void main(String[] args) throws Throwable {
                System.out.println("multithreaded test");

                /* program continues forever */
                while(true) {
                        Thread t1 = new Thread(new Hello().new RunnableImpl());
                        t1.start();
                        Thread.sleep(2000);
                        System.out.println("continue program forever");
                }
        }

        private class RunnableImpl implements Runnable {
                public void run()
                {
                        long tid = Thread.currentThread().getId();
                        System.out.println("run thread: " + tid);

                        while (true) {
                            long[] mer = new long[40000];
                            Thread t1 = new Thread(new Hello().new RunnableImpl());
                            t1.start();
                        }

                }
        }
}

output

multithreaded test
run thread: 16
run thread: 17
run thread: 18
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2020/03/12 15:45:42 - please wait.
JVMDUMP046I VM is shutting down. Reason: java/lang/OutOfMemoryError

VM continues to hang even after kill -3 or ctrl+c

pshipton commented 4 years ago

Are there any updates on this?

theresa-m commented 4 years ago

still investigating.

pshipton commented 4 years ago

I'll move it forward. If we find a low risk solution in time, we could consider fixing in the 0.20 release.

psoohoo-turbonomic commented 4 years ago

Do you guys have any suggested workarounds we could use in the meantime? This one is pretty important for our use case.

theresa-m commented 4 years ago

Thanks for your patience!

One option to work around could be -Xdump:tool:events=throw+systhrow,filter=java/lang/OutOfMemoryError,exec="kill -9 %pid" which basically just kills the process when an OutOfMemory error is thrown.

The limitation for this is it doesn't work in Docker when the JVM ends running as pid 1.

psoohoo-turbonomic commented 4 years ago

Thanks for the suggested workaround, @theresa-m! The throw/systhrow event handler seemed to do the job. Although "kill -9" didn't work, a regular SIGTERM/"kill %pid" seems to work in docker.

theresa-m commented 4 years ago

I've found two issues that are causing hangs:

The first issue is that some flags are missing from the exit trigger configuration. Pr here https://github.com/eclipse/openj9/pull/9012


The second issue looks JIT related where stopCompilationThreads from exitJavaVM is stuck waiting. This is what the dumping thread and JIT threads look like at the time:

(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0x7f26f3a52700 (LWP 26878) "java" 0x00007f26f340898d in pthread_join (threadid=139805252916992, thread_return=thread_return@entry=0x7ffc075004e8)
    at pthread_join.c:90
  2    Thread 0x7f26f2701700 (LWP 26879) "main" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7f26f3a1f700 (LWP 26880) "Signal Reporter" 0x00007f26f340f827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
    futex_word=0x7f26f130c640 <wakeUpASyncReporter>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  4    Thread 0x7f26d2724700 (LWP 26881) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  5    Thread 0x7f26d0d93700 (LWP 26889) "JIT Sampler" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7f26d0d52700 (LWP 26890) "JIT IProfiler" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7f26d00b2700 (LWP 26891) "Common-Cleaner" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
  8    Thread 0x7f26d0071700 (LWP 26892) "Finalizer maste" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 9    Thread 0x7f26a7c55700 (LWP 26894) "Thread-2" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  10   Thread 0x7f26a7a10700 (LWP 26895) "Thread-3" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  11   Thread 0x7f26a79cf700 (LWP 26896) "Thread-4" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  12   Thread 0x7f26a798e700 (LWP 26897) "Thread-5" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  13   Thread 0x7f26a794d700 (LWP 26898) "Attach API wait" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f26a7c55700 (LWP 26894))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f26f151e7f7 in monitor_wait_original (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec0776a8, self=0x7f26ec1c2738)
    at ../omr/thread/common/omrthread.c:4679
#2  monitor_wait (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec0776a8) at ../omr/thread/common/omrthread.c:4524
#3  omrthread_monitor_wait (monitor=0x7f26ec0776a8) at ../omr/thread/common/omrthread.c:4394
#4  0x00007f26eaf4c734 in TR::CompilationInfo::waitOnCompMonitor(J9VMThread*) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#5  0x00007f26eaf540a7 in TR::CompilationInfo::stopCompilationThreads() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#6  0x00007f26f17db851 in runExitStages (vm=0x7f26ec00e310, vmThread=<optimized out>) at jvminit.c:5621
#7  0x00007f26f17dbce8 in exitJavaVM (vmThread=0x1eac000, rc=3) at jvminit.c:539
#8  0x00007f26f097c609 in doJavaVMExit (agent=<optimized out>, label=<optimized out>, context=0x7f26a7c52270) at dmpagent.c:686
#9  0x00007f26f097c645 in protectedDumpFunction (portLibrary=portLibrary@entry=0x7f26f1efd340 <j9portLibrary>, userData=userData@entry=0x7f26a7c51d10)
    at dmpagent.c:2911
#10 0x00007f26f10b6753 in omrsig_protect (portLibrary=0x7f26f1efd340 <j9portLibrary>, fn=0x7f26f097c630 <protectedDumpFunction>, fn_arg=0x7f26a7c51d10,
    handler=0x7f26f097c650 <signalHandler>, handler_arg=0x0, flags=505, result=0x7f26a7c51d08) at ../../omr/port/unix/omrsignal.c:425
#11 0x00007f26f097fd8b in runDumpFunction (agent=<optimized out>, label=0x7f26a7c51db0 "", context=<optimized out>) at dmpagent.c:2885
#12 0x00007f26f097ff1f in runDumpAgent (vm=vm@entry=0x7f26ec00e310, agent=agent@entry=0x7f26ec024de0, context=context@entry=0x7f26a7c52270,
    state=state@entry=0x7f26a7c52268, detail=detail@entry=0x7f26a7c522f0 "java/lang/OutOfMemoryError", timeNow=timeNow@entry=1585323832415) at dmpagent.c:2811
#13 0x00007f26f099733b in triggerDumpAgents (vm=0x7f26ec00e310, self=0x1eac000, eventFlags=262144, eventData=<optimized out>) at trigger.c:1033
#14 0x00007f26f0995a94 in rasDumpHookExceptionSysthrow (hookInterface=hookInterface@entry=0x7f26ec0108c8, eventNum=eventNum@entry=10,
    eventData=eventData@entry=0x7f26a7c52640, userData=userData@entry=0x0) at trigger.c:1537
#15 0x00007f26f131137e in J9HookDispatch (hookInterface=0x7f26ec0108c8, taggedEventNum=<optimized out>, eventData=0x7f26a7c52640)
    at ../omr/util/hookable/hookable.cpp:235
#16 0x00007f26f17c068d in internalSetCurrentExceptionWithCause (cause=<optimized out>, utfMessage=<optimized out>, detailMessage=<optimized out>,
    exceptionNumber=<optimized out>, currentThread=0x1eac000) at exceptionsupport.c:746
#17 setCurrentExceptionWithUtfCause (currentThread=0x1eac000, exceptionNumber=<optimized out>, detailMessage=<optimized out>, utfMessage=<optimized out>,
    cause=<optimized out>) at exceptionsupport.c:807
#18 0x00007f26f1745c71 in VM_BytecodeInterpreter::run (this=0x7f26a7c52a30, vmThread=0x80) at BytecodeInterpreter.hpp:9699

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f26d2724700 (LWP 26881))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f26f151e7f7 in monitor_wait_original (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec078048, self=0x7f26ec0acf30)
    at ../omr/thread/common/omrthread.c:4679
#2  monitor_wait (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec078048) at ../omr/thread/common/omrthread.c:4524
#3  omrthread_monitor_wait (monitor=0x7f26ec078048) at ../omr/thread/common/omrthread.c:4394
#4  0x00007f26f17a537c in internalAcquireVMAccessNoMutexWithMask (vmThread=vmThread@entry=0x1d56d00, haltMask=haltMask@entry=40961) at VMAccess.cpp:384
#5  0x00007f26f17a56b3 in internalAcquireVMAccessWithMask (vmThread=0x1d56d00, haltMask=40961) at VMAccess.cpp:413
#6  0x00007f26eafb2944 in acquireVMaccessIfNeeded(J9VMThread*, TR_YesNoMaybe) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#7  0x00007f26eafab034 in TR_J9VM::getSystemClassFromClassName(char const*, int, bool) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#8  0x00007f26eb2a92f7 in TR_StringPeepholes::classesRedefined() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#9  0x00007f26eb2a93f4 in TR_StringPeepholes::prePerformOnBlocks() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#10 0x00007f26eb4f8aee in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#11 0x00007f26eb4fa41b in OMR::Optimizer::optimize() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#12 0x00007f26eb2e753d in OMR::Compilation::compile() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#13 0x00007f26eaf5eb97 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#14 0x00007f26eaf5fae1 in TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#15 0x00007f26f10b6753 in omrsig_protect (portLibrary=0x7f26f1efd340 <j9portLibrary>,
    fn=0x7f26eaf5f7f0 <TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*)>, fn_arg=0x7f26d27207e0,
    handler=0x7f26eaf4cfe0 <jitSignalHandler(J9PortLibrary*, unsigned int, void*, void*)>, handler_arg=0x1d56d00, flags=249, result=0x7f26d27207b8)
    at ../../omr/port/unix/omrsignal.c:425
#16 0x00007f26eaf617d5 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR_MethodToBeCompiled*, J9::J9SegmentProvider&) ()
   from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#17 0x00007f26eaf61d88 in TR::CompilationInfoPerThread::processEntry(TR_MethodToBeCompiled&, J9::J9SegmentProvider&) ()
   from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#18 0x00007f26eaf5d76b in TR::CompilationInfoPerThread::processEntries() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#19 0x00007f26eaf5dc62 in TR::CompilationInfoPerThread::run() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#20 0x00007f26eaf5dd0a in protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#21 0x00007f26f10b6753 in omrsig_protect (portLibrary=0x7f26f1efd340 <j9portLibrary>,
    fn=0x7f26eaf5dc90 <protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*)>, fn_arg=0x7f26f001e050,
    handler=0x7f26f17c3ee0 <structuredSignalHandler>, handler_arg=0x1d56d00, flags=506, result=0x7f26d2721e08) at ../../omr/port/unix/omrsignal.c:425
#22 0x00007f26eaf5e164 in compilationThreadProc(void*) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#23 0x00007f26f1524326 in thread_wrapper (arg=0x7f26ec0acf30) at ../omr/thread/common/omrthread.c:1705
#24 0x00007f26f34076ba in start_thread (arg=0x7f26d2724700) at pthread_create.c:333
#25 0x00007f26f2d2841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f26d0d93700 (LWP 26889))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f26f151abdf in monitor_enter_three_tier (self=0x7f26ec0ecb30, monitor=0x7f26ec005708, isAbortable=0) at ../omr/thread/common/omrthread.c:4051
#2  0x00007f26eaf6f437 in samplerThreadProc () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#3  0x00007f26f1524326 in thread_wrapper (arg=0x7f26ec0ecb30) at ../omr/thread/common/omrthread.c:1705
#4  0x00007f26f34076ba in start_thread (arg=0x7f26d0d93700) at pthread_create.c:333
#5  0x00007f26f2d2841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f26d0d52700 (LWP 26890))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f26f151e7f7 in monitor_wait_original (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec0b3358, self=0x7f26ec0ed0a8)
    at ../omr/thread/common/omrthread.c:4679
#2  monitor_wait (interruptible=0, nanos=0, millis=0, monitor=0x7f26ec0b3358) at ../omr/thread/common/omrthread.c:4524
#3  omrthread_monitor_wait (monitor=0x7f26ec0b3358) at ../omr/thread/common/omrthread.c:4394
#4  0x00007f26eb068019 in TR_IProfiler::processWorkingQueue() () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#5  0x00007f26eb06811a in iprofilerThreadProc(void*) () from /root/exitoom/jdk/lib/compressedrefs/libj9jit29.so
#6  0x00007f26f1524326 in thread_wrapper (arg=0x7f26ec0ed0a8) at ../omr/thread/common/omrthread.c:1705
#7  0x00007f26f34076ba in start_thread (arg=0x7f26d0d52700) at pthread_create.c:333
#8  0x00007f26f2d2841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)
theresa-m commented 4 years ago

@andrewcraik can I have some help from your team to diagnose this?

theresa-m commented 4 years ago

@andrewcraik can I have some assistance from the JIT team to diagnose the stopCompilationThreads hang described?

andrewcraik commented 4 years ago

@mpirvu I missed this one originally - this seems like something you might be better to look at initially?

mpirvu commented 4 years ago

Thread 9 is the one that executes the shutdown sequence. It will signal all compilation threads to stop what they are doing and then it waits for them to terminate nicely. Thread 4 is a compilation thread that tries to acquire VM access because it wants to call some VM helper. However, it is stuck waiting for VM access. Some other thread is likely holding exclusive VM access and it could be thread 9, which creates a deadlock situation.

@theresa-m Is it possible to verify that indeed thread 9 is holding exclusive VM access?

theresa-m commented 4 years ago

Sorry for the delay. I just verified the exclusive access flags for this error.

It looks like its not the thread calling stopCompilationThreads thats blocking VM access, but another xdump thread thats waiting to run.

I think this issue is also along the lines of Keith's comment https://github.com/eclipse/openj9/pull/9012#discussion_r401132321

> !threads flags
Attached Threads List. For more options, run !threads help

    !j9vmthread 0x14200 publicFlags=81 privateFlags=1008 inNative=0 // main
    !j9vmthread 0x17e00 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-000
    !j9vmthread 0x1b900 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-001
    !j9vmthread 0x1f500 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-002
    !j9vmthread 0x23100 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-003
    !j9vmthread 0x26d00 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-004
    !j9vmthread 0x2a900 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-005
    !j9vmthread 0x2e500 publicFlags=81 privateFlags=101a inNative=0 // JIT Compilation Thread-006
    !j9vmthread 0x32100 publicFlags=81 privateFlags=101a inNative=0 // JIT Diagnostic Compilation Thread-007
    !j9vmthread 0x35d00 publicFlags=81 privateFlags=101a inNative=0 // JIT-SamplerThread
    !j9vmthread 0x39900 publicFlags=81 privateFlags=101a inNative=0 // IProfiler
    !j9vmthread 0x12ea00 publicFlags=80181 privateFlags=2 inNative=0 // Common-Cleaner
    !j9vmthread 0x177700 publicFlags=81 privateFlags=2400 inNative=0 // Thread-2
    !j9vmthread 0x17b300 publicFlags=81 privateFlags=2 inNative=0 // Attach API wait loop
    !j9vmthread 0x156900 publicFlags=281 privateFlags=0 inNative=0 // Thread-4
    !j9vmthread 0x19b100 publicFlags=21 privateFlags=2400 inNative=0 // Thread-5
    !j9vmthread 0x19ed00 publicFlags=1020 privateFlags=0 inNative=0 // Thread-6

(gdb) info threads
  Id   Target Id                                           Frame
* 1    Thread 0x7fbab4b52700 (LWP 22456) "java"            0x00007fbab450c98d in pthread_join (threadid=140439852177152, thread_return=0x7ffd73a8c908)
    at pthread_join.c:90
  2    Thread 0x7fbab3805700 (LWP 22457) "main"            pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7fbab4b1f700 (LWP 22458) "Signal Reporter" 0x00007fbab4513827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0,
    futex_word=0x7fbab240f700 <wakeUpASyncReporter>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  4    Thread 0x7fba9376a700 (LWP 22459) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  5    Thread 0x7fba92669700 (LWP 22460) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  6    Thread 0x7fba92568700 (LWP 22461) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  7    Thread 0x7fba92467700 (LWP 22462) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  8    Thread 0x7fba92366700 (LWP 22463) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  9    Thread 0x7fba92265700 (LWP 22464) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  10   Thread 0x7fba92164700 (LWP 22465) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  11   Thread 0x7fba92063700 (LWP 22466) "JIT Compilation" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  12   Thread 0x7fbab0075700 (LWP 22467) "JIT Sampler"     pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
  13   Thread 0x7fbab4986700 (LWP 22468) "JIT IProfiler"   pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  14   Thread 0x7fba908ea700 (LWP 22469) "Common-Cleaner"  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  15   Thread 0x7fba905a7700 (LWP 22470) "Finalizer maste" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  16   Thread 0x7fba902d0700 (LWP 22472) "Thread-2"        pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  17   Thread 0x7fba9028f700 (LWP 22473) "Attach API wait" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  18   Thread 0x7fba90566700 (LWP 22474) "Thread-4"        pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  19   Thread 0x7fba593ff700 (LWP 22475) "Thread-5"        pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
  20   Thread 0x7fba593be700 (LWP 22476) "Thread-6"        pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
(gdb)

(gdb) bt
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fbab261f926 in omrthread_sleep () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#2  0x00007fbab1a9898d in prepareForDump () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#3  0x00007fbab1a81d99 in runDumpAgent () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#4  0x00007fbab1a993cb in triggerDumpAgents () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#5  0x00007fbab1a97b24 in rasDumpHookExceptionSysthrow () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#6  0x00007fbab241437e in J9HookDispatch () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9hookable29.so
#7  0x00007fbab28c41cd in setCurrentExceptionWithUtfCause () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#8  0x00007fbab2848a2f in VM_BytecodeInterpreterCompressed::run(J9VMThread*) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#9  0x00007fbab2846b60 in bytecodeLoopCompressed () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#10 0x00007fbab2903ba2 in c_cInterpreter () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#11 0x00007fbab28b09ca in runJavaThread () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#12 0x00007fbab290352f in javaProtectedThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#13 0x00007fbab21b87c3 in omrsig_protect () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9prt29.so
#14 0x00007fbab28ffaaa in javaThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#15 0x00007fbab2627326 in thread_wrapper () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#16 0x00007fbab450b6ba in start_thread (arg=0x7fba593ff700) at pthread_create.c:333
#17 0x00007fbab3e2c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread 20
[Switching to thread 20 (Thread 0x7fba593be700 (LWP 22476))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fbab26217f7 in omrthread_monitor_wait () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#2  0x00007fbab28a91b4 in acquireExclusiveVMAccess () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#3  0x00007fbab0a486c9 in MM_EnvironmentBase::acquireExclusiveVMAccess() () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#4  0x00007fbab0a4884a in MM_EnvironmentBase::acquireExclusiveVMAccessForGC(MM_Collector*, bool, bool) ()
   from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#5  0x00007fbab0b1c4a0 in MM_MemorySubSpaceFlat::allocationRequestFailed(MM_EnvironmentBase*, MM_AllocateDescription*, MM_MemorySubSpace::AllocationType, MM_ObjectAllocationInterface*, MM_MemorySubSpace*, MM_MemorySubSpace*) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#6  0x00007fbab0b245ac in MM_MemorySubSpaceGeneric::allocateObject(MM_EnvironmentBase*, MM_AllocateDescription*, MM_MemorySubSpace*, MM_MemorySubSpace*, bool) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#7  0x00007fbab0a655ef in MM_TLHAllocationInterface::allocateObject(MM_EnvironmentBase*, MM_AllocateDescription*, MM_MemorySpace*, bool) ()
   from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#8  0x00007fbab0a8128d in OMR_GC_AllocateObject(OMR_VMThread*, MM_AllocateInitialization*) ()
   from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#9  0x00007fbab095512f in J9AllocateIndexableObject () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9gc29.so
#10 0x00007fbab2856acf in VM_BytecodeInterpreterCompressed::run(J9VMThread*) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#11 0x00007fbab2846b60 in bytecodeLoopCompressed () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#12 0x00007fbab2903ba2 in c_cInterpreter () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#13 0x00007fbab28b09ca in runJavaThread () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#14 0x00007fbab290352f in javaProtectedThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#15 0x00007fbab21b87c3 in omrsig_protect () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9prt29.so
#16 0x00007fbab28ffaaa in javaThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#17 0x00007fbab2627326 in thread_wrapper () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#18 0x00007fbab450b6ba in start_thread (arg=0x7fba593be700) at pthread_create.c:333
#19 0x00007fbab3e2c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fbab26217f7 in omrthread_monitor_wait () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#2  0x00007fbaab346ac4 in TR::CompilationInfo::waitOnCompMonitor(J9VMThread*) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9jit29.so
#3  0x00007fbaab34e8a7 in TR::CompilationInfo::stopCompilationThreads() () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9jit29.so
#4  0x00007fbab28e0841 in runExitStages () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#5  0x00007fbab28e0cd8 in exitJavaVM () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#6  0x00007fbab1a7e609 in doJavaVMExit () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#7  0x00007fbab1a7e645 in protectedDumpFunction () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#8  0x00007fbab21b87c3 in omrsig_protect () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9prt29.so
#9  0x00007fbab1a81c9b in runDumpFunction () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#10 0x00007fbab1a81e2f in runDumpAgent () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#11 0x00007fbab1a993cb in triggerDumpAgents () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#12 0x00007fbab1a97b24 in rasDumpHookExceptionSysthrow () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9dmp29.so
#13 0x00007fbab241437e in J9HookDispatch () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9hookable29.so
#14 0x00007fbab28c41cd in setCurrentExceptionWithUtfCause () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#15 0x00007fbab2848a2f in VM_BytecodeInterpreterCompressed::run(J9VMThread*) () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#16 0x00007fbab2846b60 in bytecodeLoopCompressed () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#17 0x00007fbab2903ba2 in c_cInterpreter () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#18 0x00007fbab28b09ca in runJavaThread () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#19 0x00007fbab290352f in javaProtectedThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#20 0x00007fbab21b87c3 in omrsig_protect () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9prt29.so
#21 0x00007fbab28ffaaa in javaThreadProc () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9vm29.so
#22 0x00007fbab2627326 in thread_wrapper () from /root/exittest/jdk-11.0.8+5/lib/compressedrefs/libj9thr29.so
#23 0x00007fbab450b6ba in start_thread (arg=0x7fba902d0700) at pthread_create.c:333
#24 0x00007fbab3e2c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
mpirvu commented 4 years ago

This is a different run than the one in comment https://github.com/eclipse/openj9/issues/8552#issuecomment-605086584 because the stack traces are different. While I don't see the stacks for the compilation threads I am going to assume we are hitting a similar scenario where one compilation thread wants to acquire VM access, but is blocked waiting. As long as there is one thread holding exclusive VM access there is potential for a deadlock here.

I could replace the monitor wait with a timed_wait and proceed with shutdown even though some compilation thread is still waiting for VM access, but it's not nice.

theresa-m commented 4 years ago

@mpirvu I'm actually trying my runs and not finding similar scenarios to that stack trace in recent builds. It was from the end of March so perhaps it has been resolved elsewhere. Thanks for your input.

I have a new vm fix that seems to address all the issues: https://github.com/eclipse/openj9/pull/9840

theresa-m commented 4 years ago

Thanks @psoohoo-turbonomic and @Herr-Sepp for bringing this up. The fix should show up in the nightly AdoptOpenJDK builds and OpenJ9 0.21 release. Please let us know if you come across other issues with this command line option.

psoohoo-turbonomic commented 4 years ago

Thanks @psoohoo-turbonomic and @Herr-Sepp for bringing this up. The fix should show up in the nightly AdoptOpenJDK builds and OpenJ9 0.21 release. Please let us know if you come across other issues with this command line option.

Thanks for the good news, @theresa-m! Looking forward to giving it a try!

pshipton commented 4 years ago

@psoohoo-turbonomic pls try the 0.21 release M2 build https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/tag/jdk-11.0.8+8_openj9-0.21.0-m2

psoohoo-turbonomic commented 4 years ago

@psoohoo-turbonomic pls try the 0.21 release M2 build https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/tag/jdk-11.0.8+8_openj9-0.21.0-m2

Hi @pshipton , sorry I didn't try out M2 when it was out. But I just tested on the GA version and it looks good after several attempts to reproduce the problem!

Thanks for the fix, guys!!