eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

deadlock in JVM after malloc assert #17208

Open brihh opened 1 year ago

brihh commented 1 year ago

Java -version output

openjdk version "11.0.15" 2022-04-19 IBM Semeru Runtime Open Edition 11.0.15.0 (build 11.0.15+10) Eclipse OpenJ9 VM 11.0.15.0 (build openj9-0.32.0, JRE 11 Linux amd64-64-Bit Compressed References 20220422_425 (JIT enabled, AOT enabled) OpenJ9 - 9a84ec34e OMR - ab24b6666 JCL - b7b5b42ea6 based on jdk-11.0.15+10)

Summary of problem

malloc assert (which we are debugging separately) leads to a JVM deadlock situation because of a subsequent malloc call in the signal handler. 1680447322.989463 java: arena.c:962: __malloc_arena_thread_freeres: Assertion `a->attached_threads > 0' failed.

Thread 465 (Thread 0x7fadc6247700 (LWP 3195)):

0 0x00007fae8511896c in __lll_lock_wait_private () from /lib64/libc.so.6

1 0x00007fae85119c87 in get_free_list () from /lib64/libc.so.6

2 0x00007fae8511dd05 in malloc () from /lib64/libc.so.6

3 0x00007fae7f553975 in omrthread_allocate_memory () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9thr29.so

4 0x00007fae7f553362 in omrthread_attr_init () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9thr29.so

5 0x00007fae7fa6c8c2 in attachThreadWithCategory () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9vm29.so

6 0x00007fae7fa3845c in AttachCurrentThreadAsDaemon () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9vm29.so

7 0x00007fae71c91b3d in abortHandler () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9dmp29.so

8

9 0x00007fae850cf37f in raise () from /lib64/libc.so.6

10 0x00007fae850b9db5 in abort () from /lib64/libc.so.6

11 0x00007fae851195ca in __malloc_assert () from /lib64/libc.so.6

12 0x00007fae8511e33e in __malloc_arena_thread_freeres () from /lib64/libc.so.6

13 0x00007fae8587a170 in start_thread () from /lib64/libpthread.so.0

14 0x00007fae85194dc3 in clone () from /lib64/libc.so.6

then many other threads waiting on that lock as well. ie: Thread 4113 (Thread 0x7fad4f820700 (LWP 10603)):

0 0x00007fae8511896c in __lll_lock_wait_private () from /lib64/libc.so.6

1 0x00007fae85119c87 in get_free_list () from /lib64/libc.so.6

2 0x00007fae8511ce45 in tcache_init.part () from /lib64/libc.so.6

3 0x00007fae8511db86 in malloc () from /lib64/libc.so.6

4 0x00007fae8587be65 in pthread_getattr_np () from /lib64/libpthread.so.0

5 0x00007fae7f5537e7 in omrthread_get_stack_range () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9thr29.so

6 0x00007fae7fb371f3 in initializeCurrentOSStackFree () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9vm29.so

7 0x00007fae7fa7d483 in javaProtectedThreadProc () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9vm29.so

8 0x00007fae7f788e43 in omrsig_protect () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9prt29.so

9 0x00007fae7fa796ea in javaThreadProc () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9vm29.so

10 0x00007fae7f5514f6 in thread_wrapper () from /usr/lib/jvm/ibm-semeru-open-11-jdk/lib/default/libj9thr29.so

11 0x00007fae8587a14a in start_thread () from /lib64/libpthread.so.0

12 0x00007fae85194dc3 in clone () from /lib64/libc.so.6

i dont think that malloc() should be called from the signal handler...

Diagnostic files

pshipton commented 1 year ago

@tajila @babsingh

babsingh commented 1 year ago

@brihh It is a known issue. A simple workaround is to disable the associated JVM signal handler. In this case, the abort handler can be disabled via -XX:-HandleSIGABRT: https://www.eclipse.org/openj9/docs/xxhandlesigabrt/. In the long run, we will try to remove the usage of async signal unsafe functions, such as malloc, from the JVM signal handlers, if feasible.

brihh commented 5 months ago

any update on if this

In the long run, we will try to remove the usage of async signal unsafe functions, such as malloc, from the JVM signal handlers, if feasible.

has been done at any java level? thx.

babsingh commented 5 months ago

has been done at any java level?

no. there is no completion time associated to this issue. it is targeted as low priority due to a known workaround.