Open yanglong1010 opened 3 months ago
Thank you for the detailed report. If your analysis is correct, this is certainly a JVM bug. We'll try to create a reproducer and if successful, submit a fix in OpenJDK.
Here are some of the hs_err files: OpenJDK 11 hs_err_pid31632.log OpenJDK 8u412 hs_err_pid12810.log OpenJDK8 slowdebug hs_err_pid15470.log
I can provide a reproducer to save your time. Wait for me for a moment.
I have successfully reproduced it on both OpenJDK 8 and 11 today (see the files below, they are also included in the attachment).
On different machines, you may need to adjust the profiling duration (the only parameter of reproduce-11.sh and reproduce-8.sh) so that the dump occurs after the Class is unloaded and before the next Safepoint.
You need to try more times because there are many variables involved, which is essentially a race condition.
If you see the "Done" and no crash occurred, then need to retry.
If a crash occurs, there will be clear logs on the screen, refer to the examples below.
-rw-r--r-- 1 root root 161181 Aug 22 11:16 hs_err_pid9934.log
-rw-r--r-- 1 root root 159660 Aug 22 11:42 hs_err_pid18519.log
-rw-r--r-- 1 root root 154176 Aug 22 11:51 hs_err_pid21948.log
-rw-r--r-- 1 root root 157954 Aug 22 11:59 hs_err_pid24809.log
-rw-r--r-- 1 root root 72309 Aug 22 16:48 hs_err_pid31831.log
-rw-r--r-- 1 root root 152441 Aug 22 16:56 hs_err_pid3089.log
OpenJDK 11
bash reproduce-11.sh 20
[2024-08-22T16:48:01.219+0800] GC(2) Concurrent Reset
[2024-08-22T16:48:01.242+0800] GC(2) Concurrent Reset 23.495ms
[2024-08-22T16:48:03.466+0800] GC(5) Concurrent Reset
[2024-08-22T16:48:03.494+0800] GC(5) Concurrent Reset 27.993ms
[2024-08-22T16:48:10.579+0800] unloading class com.example.demo.TempClass 0x00000008403e2040
[2024-08-22T16:48:10.592+0800] GC(15) Concurrent Reset
[2024-08-22T16:48:10.595+0800] GC(15) Concurrent Reset 2.295ms
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840424840
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840425040
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840425840
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840426040
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840426840
[2024-08-22T16:48:17.621+0800] unloading class com.example.demo.TempClass 0x0000000840427040
[2024-08-22T16:48:17.634+0800] GC(16) Concurrent Reset
[2024-08-22T16:48:17.637+0800] GC(16) Concurrent Reset 2.193ms
Done
# SIGSEGV (0xb) at pc=0x00007f55e89c0658, pid=31831, tid=31924
# V [libjvm.so+0x9be658] oopDesc::is_a(Klass*) const+0x48
OpenJDK 8
bash reproduce-8.sh 25
1.107: [CMS-concurrent-reset: 0.034/0.034 secs] [Times: user=0.11 sys=0.04, real=0.03 secs]
8.557: [CMS-concurrent-reset: 0.003/0.003 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
13.759: [GC (CMS Final Remark) [YG occupancy: 106902 K (111808 K)]13.759: [Rescan (parallel) , 0.0008202 secs]13.760: [weak refs processing, 0.0000369 secs]13.760: [class unloading[Unloading class com.example.demo.TempClass 0x0000000100466028]
13.777: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
20.901: [GC (CMS Final Remark) [YG occupancy: 108890 K (111808 K)]20.901: [Rescan (parallel) , 0.0008640 secs]20.902: [weak refs processing, 0.0000437 secs]20.902: [class unloading[Unloading class com.example.demo.TempClass 0x0000000100485028]
[Unloading class com.example.demo.TempClass 0x0000000100484828]
20.917: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
27.989: [GC (CMS Final Remark) [YG occupancy: 108890 K (111808 K)]27.989: [Rescan (parallel) , 0.0008756 secs]27.990: [weak refs processing, 0.0000449 secs]27.990: [class unloading[Unloading class com.example.demo.TempClass 0x0000000100484028]
[Unloading class com.example.demo.TempClass 0x0000000100483828]
[Unloading class com.example.demo.TempClass 0x0000000100483028]
[Unloading class com.example.demo.TempClass 0x0000000100482828]
28.005: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Done
# SIGSEGV (0xb) at pc=0x00007f6b96faa108, pid=24809, tid=0x00007f6b32cc5700
# V [libjvm.so+0x766108] oopDesc::is_a(Klass*) const+0x48
is this would happen with --cstack vm?
is this would happen with --cstack vm?
Yes. --cstack vm
and AsyncGetCallTrace all get jmethodIds for the methods on the stack. Later on dump time, the jmethodIds are translated to Class and Method names, the crash occurs at this phase.
is this would happen with --cstack vm?
Yes.
--cstack vm
and AsyncGetCallTrace all get jmethodIds for the methods on the stack. Later on dump time, the jmethodIds are translated to Class and Method names, the crash occurs at this phase.
java have so many problem of profiling that I don't dare to open it in prod.
@krk managed to create a test that reliably reproduces the issue in <1 sec. The test only uses pure JVM TI, no AsyncGetCallTrace
or other profiling code. The bad news is that the test crashes all JVM versions, including those having the patch for JDK-8268364. However, it still makes sense to backport the patch, since it helps to avoid crashes with a modified test that calls JNI DeleteLocalRef
as soon as jclass
is no longer needed.
So, a backport of JDK-8268364 together with DeleteLocalRef
change in async-profiler will likely prevent crashes.
In the meantime, I'm investigating other possible workarounds. Forcing a safepoint does not solve the issue, but VMStructs approach looks promising.
Is my analysis correct ? Why forcing a safepoint does not solve the issue ?
Why forcing a safepoint does not solve the issue ?
What prevents from class unloading after a safepoint?
You are right, other class unloading can happen after the manually triggered safepoint, and we can not prevent this.
The test only uses pure JVM TI, no AsyncGetCallTrace or other profiling code.
does the reproducer created by @krk follow my reproduction steps, or only used contrived jclass (with invalid pointer inside) to call jvmti GetClassSignature
?
If used my steps, I don't know why this issue happens on JDK 18 and above.
Describe the bug
Hi, Andrei
We recently encountered a crash. After analysis, I think this problem may occur on OpenJDK 8, 11, and 17. It will not occur on OpenJDK 18 and above because JDK-8268364 avoids this problem at the JDK level.
V [libjvm.so+0x766108] oopDesc::is_a(Klass) const+0x48 V [libjvm.so+0x7635ec] jvmti_GetClassSignature+0x15c C [libasyncProfiler.so+0x35389] Recording::finishChunk()+0x20c9 C [libasyncProfiler.so+0x3599c] FlightRecorder::stop()+0x9c C [libasyncProfiler.so+0x39559] Profiler::stop(bool)+0x159 C [libasyncProfiler.so+0x3b4af] Profiler::runInternal(Arguments&, Writer&) [clone .constprop.772]+0x32f C [libasyncProfiler.so+0x3b6d1] Profiler::run(Arguments&)+0x181 C [libasyncProfiler.so+0x3d019] Agent_OnAttach+0x209 V [libjvm.so+0x7c12d9] JvmtiExport::load_agent_library(AttachOperation, outputStream)+0x329 V [libjvm.so+0x3328df] attach_listener_thread_entry(JavaThread, Thread)+0x15f V [libjvm.so+0xb11d32] JavaThread::thread_main_inner()+0x212 V [libjvm.so+0x942a62] java_start(Thread)+0xf2 C [libpthread.so.0+0x7ea5] start_thread+0xc5
#
A fatal error has been detected by the Java Runtime Environment:
#
SIGSEGV (0xb) at pc=0x00007f73c96aa658, pid=31632, tid=31752
#
JRE version: OpenJDK Runtime Environment (Red_Hat-11.0.23.0.9-2.el7_9) (11.0.23+9) (build 11.0.23+9-LTS)
Java VM: OpenJDK 64-Bit Server VM (Red_Hat-11.0.23.0.9-2.el7_9) (11.0.23+9-LTS, mixed mode, sharing, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
Problematic frame:
V [libjvm.so+0x9be658] oopDesc::is_a(Klass) const+0x48 V [libjvm.so+0xa04153] jvmti_GetClassSignature+0x173 C [libasyncProfiler2540885687583512417.so+0x72b6] _jvmtiEnv::GetClassSignature(_jclass, char, char)+0x38 C [libasyncProfiler2540885687583512417.so+0x131f3] Lookup::fillJavaMethodInfo(MethodInfo, _jmethodID, bool, int)+0x7b C [libasyncProfiler2540885687583512417.so+0x135cb] Lookup::resolveMethod(ASGCT_CallFrame&, int)+0x11b C [libasyncProfiler2540885687583512417.so+0x16afb] Recording::writeStackTraces(Buffer, Lookup)+0x147 C [libasyncProfiler2540885687583512417.so+0x16381] Recording::writeCpool(Buffer*)+0x19d C [libasyncProfiler2540885687583512417.so+0x141d3] Recording::finishChunk(bool)+0xdf
8 0x00007f654c046ee3 in Klass::is_subtype_of (this=0x6f56df778, k=0x100002000) at /root/openjdk/jdk8u-dev-fork/hotspot/src/share/vm/oops/klass.hpp:438
9 0x00007f654c04849d in oopDesc::is_a (this=0xaad81430, k=0x100002000) at /root/openjdk/jdk8u-dev-fork/hotspot/src/share/vm/oops/oop.inline.hpp:178
10 0x00007f654c4d4627 in jvmti_GetClassSignature (env=0x7f6484706758, klass=0x7f64545fb910, signature_ptr=0x7f64bc402e10, generic_ptr=0x0) at ../generated/jvmtifiles/jvmtiEnter.cpp:3242
11 0x00007f64bd6f2336 in _jvmtiEnv::GetClassSignature (this=0x7f6484706758, klass=0x7f64545fb910, signature_ptr=0x7f64bc402e10, generic_ptr=0x0) at /usr/lib/jvm/java-11-openjdk-11.0.23.0.9-2.el7_9.x86_64//include/jvmti.h:2242
12 0x00007f64bd6fe29c in Lookup::fillJavaMethodInfo (this=0x7f64bc402fa0, mi=0x7f64545fb648, method=0x7f646c003120, first_time=true, i=1) at src/flightRecorder.cpp:198
(gdb) x/xg 0x7f646c003120 [fillJavaMethodInfo(....method=0x7f646c003120)] 0x7f646c003120: 0x00007f64bdb54858 [*(Method**)jmethodId is not NULL]
(gdb) x/xg 0x7f64545fb910 (The klass returned from GetMethodDeclaringClass) 0x7f64545fb910: 0x00000000aad81430 (java_mirror oop)
(gdb) x/32xw 0xaad81430 (the memory of java_mirror oop is freed) 0xaad81430: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81440: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81450: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81460: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81470: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81480: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xaad81490: 0xdeadbeef 0xdeadbeef 0xdeadbeef 0xdeadbeef
(gdb) p/x 0xdeadbeefl << 3 (from invalid Klass word, get invalid Klass*) $2 = 0x6f56df778
(gdb) x/xg 0x6f56df778 (Klass* is not accessible) 0x6f56df778: Cannot access memory at address 0x6f56df778
0 Klass::is_subtype_of (this=0x100002000, k=0x100002000) at /root/openjdk/jdk8u-dev-fork/hotspot/src/share/vm/oops/klass.hpp:437
1 0x00007ffff617049d in oopDesc::is_a (this=0xaae013e0, k=0x100002000) at /root/openjdk/jdk8u-dev-fork/hotspot/src/share/vm/oops/oop.inline.hpp:178
2 0x00007ffff65fc627 in jvmti_GetClassSignature (env=0x7fff382095c8, klass=0x7ffefc001100, signature_ptr=0x7fff1259de10, generic_ptr=0x0) at ../generated/jvmtifiles/jvmtiEnter.cpp:3242
3 0x00007fff1388d2b6 in _jvmtiEnv::GetClassSignature (this=0x7fff382095c8, klass=0x7ffefc001100, signature_ptr=0x7fff1259de10, generic_ptr=0x0) at /usr/lib/jvm/java-11-openjdk-11.0.23.0.9-2.el7_9.x86_64//include/jvmti.h:2242
4 0x00007fff138991f3 in Lookup::fillJavaMethodInfo (this=0x7fff1259dfa0, mi=0x7ffefc5bef28, method=0x7fff39378e58, first_time=true, i=0) at src/flightRecorder.cpp:198
(gdb) x/xg 0x7fff39378e58 [fillJavaMethodInfo(....method=0x7fff39378e58)] 0x7fff39378e58: 0x00007fffa5ec9670 [*(Method**)jmethodId is not NULL]
(gdb) x/xg 0x7ffefc001100 (The klass returned from GetMethodDeclaringClass) 0x7ffefc001100: 0x00000000aae013e0 (java_mirror oop)
(gdb) x/32xw 0xaae013e0 (valid java_mirror oop) 0xaae013e0: 0xceb1a511 0x0000006a 0x20000400 0x00000000 0xaae013f0: 0x00000000 0xab575050 0x00000000 0xab074f10 0xaae01400: 0xaabae230 0x00000000 0x00000000 0x00000000 0xaae01410: 0x00000000 0x00000000 0x00000000 0x00000000 0xaae01420: 0x00000000 0x00000000 0x0013dcb8 0x00000001 0xaae01430: 0x00000000 0x00000000 0x00000000 0x00000016 0xaae01440: 0x00000006 0x00000000 0xaaba5b18 0xaaad5508 0xaae01450: 0xaab56ca8 0xab075868 0xab04e728 0xaaad0ab8 (gdb) p/x 0x20000400l << 3 (valid Klass*,accessible) $1 = 0x100002000