eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

java/lang/invoke/MethodHandleNatives.resolve - segmation fault #14191

Open ninja- opened 2 years ago

ninja- commented 2 years ago

After upgrade from AdoptOpenJDK OpenJ9 16 to latest Semeru build, my app is crashing a few minutes after start with this crash:
If there is some procedure to easily get debug symbols for Semeru please let me know, I remember we were discussing a few months ago here on GitHub to include them by default but looks like nothing changed on that....

2021/12/20 18:06:22 Unhandled exception
2021/12/20 18:06:22 Type=Segmentation error vmState=0x00000000
2021/12/20 18:06:22 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
2021/12/20 18:06:22 Handler1=00007F0A6D067390 Handler2=00007F0A6CDC5B60 InaccessibleAddress=0000000000000008
2021/12/20 18:06:22 RDI=0000000000A5FE00 RSI=00007F0A68135C68 RAX=000000000184E300 RBX=0000000000A5FE00
2021/12/20 18:06:22 RCX=00007F0A6AB3AB73 RDX=00007F0A68D2A848 R8=00007F0A6AB3AB73 R9=0000000000000022
2021/12/20 18:06:22 R10=0000000000000010 R11=0000000000000000 R12=0000000000A66760 R13=00007F0A6AB3AB70
2021/12/20 18:06:22 R14=00007F0A6A809C30 R15=00007F0A6DBA4380
2021/12/20 18:06:22 RIP=00007F0A6634298A GS=0000 FS=0000 RSP=00007F0A38E3AD20
2021/12/20 18:06:22 EFlags=0000000000010206 CS=0033 RBP=00007F0A68040990 ERR=0000000000000004
2021/12/20 18:06:22 TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
2021/12/20 18:06:22 xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
2021/12/20 18:06:22 xmm1 617263656e696d2f (f: 1852402944.000000, d: 2.585226e+161)
2021/12/20 18:06:22 xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
2021/12/20 18:06:22 xmm3 726f466164626d61 (f: 1684172160.000000, d: 1.668333e+243)
2021/12/20 18:06:22 xmm4 3fcbc2860d224770 (f: 220350320.000000, d: 2.168739e-01)
2021/12/20 18:06:22 xmm5 0000001a00000059 (f: 89.000000, d: 5.517189e-313)
2021/12/20 18:06:22 xmm6 0000001a00000059 (f: 89.000000, d: 5.517189e-313)
2021/12/20 18:06:22 xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
2021/12/20 18:06:22 xmm8 c89dae366d4e8730 (f: 1833862912.000000, d: -6.463844e+41)
2021/12/20 18:06:22 xmm9 0000000000000004 (f: 4.000000, d: 1.976263e-323)
2021/12/20 18:06:22 xmm10 42b46ebcc114b737 (f: 3239360256.000000, d: 2.246585e+13)
2021/12/20 18:06:22 xmm11 082f5ac1a441c362 (f: 2755773184.000000, d: 2.967522e-269)
2021/12/20 18:06:22 xmm12 00000000421b666f (f: 1109091968.000000, d: 5.479642e-315)
2021/12/20 18:06:22 xmm13 c007fffffffffffe (f: 4294967296.000000, d: -3.000000e+00)
2021/12/20 18:06:22 xmm14 3fd3333340000000 (f: 1073741824.000000, d: 3.000000e-01)
2021/12/20 18:06:22 xmm15 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
2021/12/20 18:06:22 Module=/usr/lib/jvm/openjdk/lib/default/libjclse29.so
2021/12/20 18:06:22 Module_base_address=00007F0A662F0000 Symbol=Java_java_lang_invoke_MethodHandleNatives_resolve
2021/12/20 18:06:22 Symbol_address=00007F0A663419A0
2021/12/20 18:06:22 Target=2_90_20211207_75 (Linux 5.4.0-86-generic)
2021/12/20 18:06:22 CPU=amd64 (12 logical CPUs) (0xfb4f37000 RAM)
2021/12/20 18:06:22 ----------- Stack Backtrace -----------
2021/12/20 18:06:22 Java_java_lang_invoke_MethodHandleNatives_resolve+0xfea (0x00007F0A6634298A [libjclse29.so+0x5298a])
2021/12/20 18:06:22 (0x00007F0A6114EF39 [<unknown>+0x0])

0SECTION       ENVINFO subcomponent dump routine
NULL           =================================
1CIJAVAVERSION JRE 17 Linux amd64-64 (build 17.0.1+12)
1CIVMVERSION   20211207_75
1CIJ9VMTAG     openj9-0.29.1
1CIJ9VMVERSION 7d055dfcb
1CIJITVERSION  j9jit_20211207_1909_
1CIOMRVERSION  e30892e2b_CMPRSS
1CIJCLVERSION  fc67fbe50a0 based on jdk-17.0.1+12
1CIVENDOR      IBM Corporation
1CIPRODUCT     IBM Semeru Runtime
1CIEXTVERSION  17.0.1.0
1CIJITMODES    JIT enabled, AOT enabled, FSD disabled, HCR enabled
1CIRUNNINGAS   Running as an embedded JVM
1CIVMIDLESTATE VM Idle State: ACTIVE
1CICONTINFO    Running in container : TRUE
1CICGRPINFO    JVM support for cgroups enabled : TRUE
1CISTARTTIME   JVM start time: 2021/12/20 at 18:04:39:420
1CISTARTNANO   JVM start nanotime: 7504983538282944
1CIPROCESSID   Process ID: 85 (0x55)
1CICMDLINE     [not available]
1CIJAVAHOMEDIR Java Home Dir:   /usr/lib/jvm/openjdk
1CIJAVADLLDIR  Java DLL Dir:    /usr/lib/jvm/openjdk/bin

1XMCURTHDINFO  Current thread
3XMTHREADINFO      "Server thread" J9VMThread:0x0000000000A5FE00, omrthread_t:0x00007F0A68BC51D0, java/lang/Thread:0x00000000F0A60BD0, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x2B, isDaemon:false)
3XMTHREADINFO1            (native thread ID:0x7E, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMTHREADINFO2            (native stack address range from:0x00007F0A38DFF000, to:0x00007F0A38E3F000, size:0x40000)
3XMCPUTIME               CPU usage total: 11.211991560 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=15698048 (0xEF8880)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/invoke/MethodHandleNatives.resolve(Native Method)
4XESTACKTRACE                at java/lang/invoke/MemberName$Factory.resolve(MemberName.java:1085(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MemberName$Factory.resolveOrFail(MemberName.java:1114(Compiled Code))
4XESTACKTRACE                at java/lang/invoke/MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3647)
4XESTACKTRACE                at java/lang/invoke/MethodHandles$Lookup.findGetter(MethodHandles.java:3025)
4XESTACKTRACE                at app/a1.<clinit>(a1.java)
4XESTACKTRACE                at app/i.a(i.java)
4XESTACKTRACE                at app/av.a(av.java)
4XESTACKTRACE                at jdk/internal/reflect/NativeMethodAccessorImpl.invoke0(Native Method)
4XESTACKTRACE                at jdk/internal/reflect/NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77(Compiled Code))
4XESTACKTRACE                at jdk/internal/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43(Compiled Code))
4XESTACKTRACE                at java/lang/reflect/Method.invoke(Method.java:568(Compiled Code))

3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               (0x00007F0A6CDC2572 [libj9prt29.so+0x26572])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6CDC25EE [libj9prt29.so+0x265ee])
4XENATIVESTACK               (0x00007F0A6CDC26E4 [libj9prt29.so+0x266e4])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6CDC244B [libj9prt29.so+0x2644b])
4XENATIVESTACK               (0x00007F0A6CDC3442 [libj9prt29.so+0x27442])
4XENATIVESTACK               (0x00007F0A6CDC41E4 [libj9prt29.so+0x281e4])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6C3DC7C6 [libj9dmp29.so+0x1b7c6])
4XENATIVESTACK               (0x00007F0A6C3DC95D [libj9dmp29.so+0x1b95d])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6C3D8EDD [libj9dmp29.so+0x17edd])
4XENATIVESTACK               (0x00007F0A6C3D3E2D [libj9dmp29.so+0x12e2d])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6C3D51D0 [libj9dmp29.so+0x141d0])
4XENATIVESTACK               (0x00007F0A6C3DF1EC [libj9dmp29.so+0x1e1ec])
4XENATIVESTACK               (0x00007F0A6C3C5D9D [libj9dmp29.so+0x4d9d])
4XENATIVESTACK               (0x00007F0A6C3C5475 [libj9dmp29.so+0x4475])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6C3C8AEB [libj9dmp29.so+0x7aeb])
4XENATIVESTACK               (0x00007F0A6C3C8C7F [libj9dmp29.so+0x7c7f])
4XENATIVESTACK               (0x00007F0A6C3E168E [libj9dmp29.so+0x2068e])
4XENATIVESTACK               (0x00007F0A6D066F32 [libj9vm29.so+0x3ff32])
4XENATIVESTACK               (0x00007F0A6CDC68C3 [libj9prt29.so+0x2a8c3])
4XENATIVESTACK               (0x00007F0A6D067145 [libj9vm29.so+0x40145])
4XENATIVESTACK               (0x00007F0A6CDC5D8A [libj9prt29.so+0x29d8a])
4XENATIVESTACK               (0x00007F0A6DFEF3D0 [libpthread.so.0+0x143d0])
4XENATIVESTACK               Java_java_lang_invoke_MethodHandleNatives_resolve+0xfea (0x00007F0A6634298A [libjclse29.so+0x5298a])
4XENATIVESTACK               (0x00007F0A6114EF39 [<unknown>+0x0])
pshipton commented 2 years ago

@tajila

pshipton commented 2 years ago

To get full debug symbols, there is a debug image that can be overlayed. The bin and lib directories in the debug image need to be recursively (including sub-directories) copied to the bin and lib directories of the JDK.

https://github.com/ibmruntimes/semeru17-binaries/releases/download/jdk-17.0.1%2B12_openj9-0.29.1/ibm-semeru-open-debugimage_x64_linux_17.0.1_12_openj9-0.29.1.tar.gz

gdb is going to do a better job of printing a stack trace, you can overlay the debug image, open the core in gdb and use where to get the stack trace.

Can the core file or test case to recreate the problem be shared?

tajila commented 2 years ago

@ninja- do you have a sample application that we can try to reproduce the problem?

ninja- commented 2 years ago

Hi

Sorry the classpath is pretty complex and the main jar that triggers the bug is 3rd party obfuscated component - but without any natives inside to my knowledge.

Coredump was not captured sadly because the app is running in docker container so it's gone after crash. I plan to update the tooling to capture coredumps and continue investigating this next week.

In the meantime, maybe it's worth double checking everything in this func around JAVA_SPEC_VERSION >= 16? Could this commit on master fixed this issue silently? https://github.com/eclipse-openj9/openj9/commit/8635b0422fc8513dcf2e51390027cd8a4ee00cd5

ninja- commented 2 years ago

14125

DanHeidinga commented 2 years ago

@ninja- are you able to provide a core file? Unless we have more info, we won't be able to address this for the next release

pshipton commented 2 years ago

We can't diagnose this problem with the available information, moving it forward.

headius commented 2 weeks ago

I appear to have a new instance of this in JRuby's OpenJ9 CI run here: https://github.com/jruby/jruby/actions/runs/11525329378/job/32088572723?pr=8386

This just started crashing with a segmentation error in the past few days. I'm not sure what changed in our environment (the PR was just adding non-OpenJ9 JDK 23 jobs to our build matrix).

headius commented 2 weeks ago

I have captured the dump files from that crash and uploaded them here:

https://drive.google.com/file/d/1OrsBTG5w6HRsQmPi-6xmVRYIusz186ty/view?usp=sharing

Please let me know when you have downloaded this file so I can delete it.

I will attempt to switch to a different build (using the IBM Semeru dist) for now.

headius commented 2 weeks ago

Switching to the "semeru" distribution from setup-java does not appear to have helped, so I'll be removing OpenJ9 from our CI for now.

tajila commented 1 week ago

@headius Thanks for the diagnostics. If possible could you please reproduce it and run jpackcore, this will give us more info to address to issue.

In the meantime we will take a look at what you've sent us.

tajila commented 1 week ago

@babsingh Please take a look. You can use nativedecoder to find the line numbers in the stack strace.

hzongaro commented 1 week ago

Please let me know when you have downloaded this file so I can delete it.

@headius, I have also uploaded your zip file here:

https://ibm-my.sharepoint.com/:u:/p/zongaro_ca/EQ8r46fu6JpGhpva_WytoNIB89-10tfcDkwKqjV6Nf2aUw?e=NUq30C

If anyone else needs access to it, but cannot open that link, please let me know.

headius commented 1 week ago

@tajila I will give it a shot.

headius commented 1 week ago

Based on this output from the crash, it would seem it is not generating the right core file, am I right?

If so, how do I make it successfully dump core?

If not, which if these files should I run jpackcore on?

JVMDUMP039I Processing dump event "gpf", detail "" at 2024/10/26 13:50:53 - please wait.
JVMDUMP032I JVM requested System dump using '/home/runner/work/jruby/jruby/core.20241026.135053.2237.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.2264.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.

JVMPORT0[49](https://github.com/jruby/jruby/actions/runs/11532327632/job/32104018591#step:5:50)I The core file created by child process with pid = 2264 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" to find where the core file is written and ensure that program does not truncate it.

JVMDUMP012E Error in System dump: /home/runner/work/jruby/jruby/core.20241026.13[50](https://github.com/jruby/jruby/actions/runs/11532327632/job/32104018591#step:5:51)53.2237.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/runner/work/jruby/jruby/javacore.20241026.135053.2237.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/runner/work/jruby/jruby/javacore.20241026.135053.2237.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/runner/work/jruby/jruby/Snap.20241026.135053.2237.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/runner/work/jruby/jruby/Snap.20241026.135053.2237.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/runner/work/jruby/jruby/jitdump.20241026.135053.2237.0004.dmp' in response to an event
JVMDUMP0[51](https://github.com/jruby/jruby/actions/runs/11532327632/job/32104018591#step:5:52)I JIT dump occurred in 'main' thread 0x0000000000017000
JVMDUMP053I JIT dump is recompiling java/lang/invoke/MemberName$Factory.resolve(BLjava/lang/invoke/MemberName;Ljava/lang/Class;IZ)Ljava/lang/invoke/MemberName;
JVMDUMP053I JIT dump is recompiling java/lang/invoke/MemberName$Factory.resolveOrFail(BLjava/lang/invoke/MemberName;Ljava/lang/Class;ILjava/lang/Class;)Ljava/lang/invoke/MemberName;
JVMDUMP010I JIT dump written to /home/runner/work/jruby/jruby/jitdump.20241026.1350[53](https://github.com/jruby/jruby/actions/runs/11532327632/job/32104018591#step:5:54).2237.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".
headius commented 1 week ago

@hzongaro Thank you! I will delete my copy of those files and let y'all know if I can upload a packed core thingy.

babsingh commented 1 week ago

re https://github.com/eclipse-openj9/openj9/issues/14191#issuecomment-2442054525:

JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" specifies that the core dump is to be piped to an external program. Attempting to rename either core or core.2264. Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it. JVMPORT049I The core file created by child process with pid = 2264 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" to find where the core file is written and ensure that program does not truncate it.

The below instructions will allow the machine to correctly produce core files.

ii. Prevent Apport from being upgraded (which re-enables it)

    # apt-mark hold apport
    # apt-mark showhold # confirm it's marked
    apport
headius commented 1 week ago

Even with sudo it seems that the core_pattern file can't be written to on this system (standard GitHub Actions runner on Ubuntu:

/home/runner/work/_temp/477d41c2-bd5b-45a8-ab3b-b29ab49cac66.sh: line 1: /proc/sys/kernel/core_pattern: Permission denied
headius commented 1 week ago

FWIW it should be possible for you to take a branch from this JRuby commit and reproduce the failure based on how it runs in CI. I do not have a local reproduction, so my only way to reproduce is on GHA:

https://github.com/jruby/jruby/pull/8386/commits/d64634e9d5734f811fb2ae6d4a42f1f5c72b1676

babsingh commented 1 week ago

Here is a javacore, which has line numbers for the thread's native stack: nstack.20241028083039.javacore.20241026.134413.2239.0002.txt

The crash happens here:

4XENATIVESTACK               Java_java_lang_invoke_MethodHandleNatives_resolve+0x8eb (0x00007FAE71B5D9FB [libjclse29.so+0x539fb]) : 0x539fb <Java_java_lang_invoke_MethodHandleNatives_resolve(JNIEnv*, jclass, jobject, jclass, jint, jboolean)+2283> [/home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/openj9/runtime/jcl/common/java_lang_invoke_MethodHandleNatives.cpp:1113]

The associated code: https://github.com/eclipse-openj9/openj9/blob/4760d5d3202149479ba7beca189fcbdbe2e0e79b/runtime/jcl/common/java_lang_invoke_MethodHandleNatives.cpp#L1113

verifyData (J9JavaVM->bytecodeVerificationData) is NULL. We will need to add a null check to fix this.

In the Github CI job, I also see the following message: JVMJ9VM193W Since Java 13 -Xverify:none and -noverify were deprecated for removal and may not be accepted options in the future. @headius Can we run the job without -Xverify:none? -Xverify:none is probably setting bytecodeVerificationData to NULL.

headius commented 1 week ago

@babsingh Can do!

headius commented 1 week ago

Perhaps unsurprisingly, it no longer crashes with -Xverify:none removed from command line flags.

headius commented 1 week ago

https://github.com/jruby/jruby/actions/runs/11560119480/job/32176387644?pr=8386

babsingh commented 1 week ago

@headius Thanks for confirming. https://github.com/eclipse-openj9/openj9/pull/20422 should fix the segfault that occurs with -Xverify:none. Will get this PR reviewed and merged.

babsingh commented 1 week ago

To verify if the issue is resolved, I was trying to run the CI cmds locally with JDK17 w/ the fix. My local environment is not correctly setup for the CI cmds; so, I am encountering unrelated errors.

@headius Can you try JDK17 w/ the fix on your end to see if the issue is resolved?

babsingh commented 1 day ago

@headius Were you able to try the above JDK?

headius commented 22 hours ago

Oops, sorry about that, I forgot about this one. I'll verify over the weekend and let you know!