eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

crash vmState=0x0005ffff jitMethodTranslated #20257

Closed pshipton closed 8 hours ago

pshipton commented 1 week ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/110 PatModImg_PlatMod_0 vmState [0x5ffff]: {J9VMSTATE_JIT} {Illegal optimization number}

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/110/system_test_output.tar.gz

01:56:47  CMJ stderr Unhandled exception
01:56:47  CMJ stderr Type=Segmentation error vmState=0x0005ffff
01:56:47  CMJ stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
01:56:47  CMJ stderr Handler1=0000FFFF9EC332F4 Handler2=0000FFFF9EF99BA0 InaccessibleAddress=0000000000000018
01:56:47  CMJ stderr R0=0000000000000000 R1=0000000000000000 R2=00000000000CE750 R3=0000000000084468
01:56:47  CMJ stderr R4=00000000000CE428 R5=0000000000000000 R6=0000FFFF981742C0 R7=0000FFFF7A800000
01:56:47  CMJ stderr R8=0000FFFF9DD410D0 R9=0018CBC64B64F75E R10=00FFFFFFFFFFFFFF R11=000015689AE64E8E
01:56:47  CMJ stderr R12=000000007FFFFFFF R13=000000007FFFFFFF R14=0000000000000008 R15=0000000000001448
01:56:47  CMJ stderr R16=0000000000000001 R17=0000FFFF9F4CAE10 R18=0000000000001320 R19=0000000000031368
01:56:47  CMJ stderr R20=0000FFFF7B8077FC R21=000000000001B700 R22=0000FFFF98028A40 R23=0000FFFF980E1AB0
01:56:47  CMJ stderr R24=0000000000000000 R25=0000FFFF981742C0 R26=000000000001B700 R27=0000000000000000
01:56:47  CMJ stderr R28=0000000000000000 R29=0000FFFF9C4EED30 R30=0000FFFF9E4E0A40 R31=0000FFFF9C4EED30
01:56:47  CMJ stderr PC=0000FFFF9E4E0A8C SP=0000FFFF9C4EED30 PSTATE=0000000060001000
01:56:47  CMJ stderr V0=3f50624dd2f1a9fc (f: 3539053056.000000, d: 1.000000e-03)
01:56:47  CMJ stderr V1=408f400000000000 (f: 0.000000, d: 1.000000e+03)
01:56:47  CMJ stderr V2=41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
01:56:47  CMJ stderr V3=8b090294d2800309 (f: 3531604736.000000, d: -1.665669e-255)
01:56:47  CMJ stderr V4=000000000000ffff (f: 65535.000000, d: 3.237859e-319)
01:56:47  CMJ stderr V5=0000ffff412590b8 (f: 1092980864.000000, d: 1.390655e-309)
01:56:47  CMJ stderr V6=00000000000000f0 (f: 240.000000, d: 1.185758e-321)
01:56:47  CMJ stderr V7=0000004000000040 (f: 64.000000, d: 1.358077e-312)
01:56:47  CMJ stderr V8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V10=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V16=0000000100000001 (f: 1.000000, d: 2.121996e-314)
01:56:47  CMJ stderr V17=0000000000000800 (f: 2048.000000, d: 1.011846e-320)
01:56:47  CMJ stderr V18=4000000000000000 (f: 0.000000, d: 2.000000e+00)
01:56:47  CMJ stderr V19=3f9eb851eb851eb8 (f: 3951369984.000000, d: 3.000000e-02)
01:56:47  CMJ stderr V20=3fb1eb851eb851ec (f: 515396064.000000, d: 7.000000e-02)
01:56:47  CMJ stderr V21=0000000000000002 (f: 2.000000, d: 9.881313e-324)
01:56:47  CMJ stderr V22=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V23=0000000000000008 (f: 8.000000, d: 3.952525e-323)
01:56:47  CMJ stderr V24=3fd6666666666666 (f: 1717986944.000000, d: 3.500000e-01)
01:56:47  CMJ stderr V25=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V26=0000000000000001 (f: 1.000000, d: 4.940656e-324)
01:56:47  CMJ stderr V27=000000000000000a (f: 10.000000, d: 4.940656e-323)
01:56:47  CMJ stderr V28=0000000000000001 (f: 1.000000, d: 4.940656e-324)
01:56:47  CMJ stderr V29=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V30=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr V31=0000000000000000 (f: 0.000000, d: 0.000000e+00)
01:56:47  CMJ stderr Module=/home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/jdkbinary/j2sdk-image/lib/default/libj9jit29.so
01:56:47  CMJ stderr Module_base_address=0000FFFF9DC00000
01:56:47  CMJ stderr 
01:56:47  CMJ stderr Method_being_compiled=java/lang/Object.hashCode()I
01:56:47  CMJ stderr Target=2_90_20240928_136 (Linux 5.14.0-508.el9.aarch64)
01:56:47  CMJ stderr CPU=aarch64 (8 logical CPUs) (0x3d1456000 RAM)
01:56:47  CMJ stderr ----------- Stack Backtrace -----------
01:56:47  CMJ stderr jitMethodTranslated+0x8c (0x0000FFFF9E4E0A8C [libj9jit29.so+0x8e0a8c])
01:56:47  CMJ stderr _ZN2TR15CompilationInfo14compilationEndEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEP11J9JITConfigPvS7_P11TR_FrontEndP21TR_MethodToBeCompiledPNS_11CompilationE+0x2ac (0x0000FFFF9DD2CA4C [libj9jit29.so+0x12ca4c])
01:56:47  CMJ stderr _ZN2TR28CompilationInfoPerThreadBase20postCompilationTasksEP10J9VMThreadP21TR_MethodToBeCompiledP8J9MethodPKvP19J9JITExceptionTablebbP20TR_RelocationRuntime+0x290 (0x0000FFFF9DD3B330 [libj9jit29.so+0x13b330])
01:56:47  CMJ stderr _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x304 (0x0000FFFF9DD3CE88 [libj9jit29.so+0x13ce88])
01:56:47  CMJ stderr _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x118 (0x0000FFFF9DD3D308 [libj9jit29.so+0x13d308])
01:56:47  CMJ stderr _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x2f8 (0x0000FFFF9DD3C098 [libj9jit29.so+0x13c098])
01:56:47  CMJ stderr _ZN2TR24CompilationInfoPerThread3runEv+0x50 (0x0000FFFF9DD3C544 [libj9jit29.so+0x13c544])
01:56:47  CMJ stderr _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x7c (0x0000FFFF9DD3C5EC [libj9jit29.so+0x13c5ec])
01:56:47  CMJ stderr omrsig_protect+0x21c (0x0000FFFF9EF9A82C [libj9prt29.so+0x2882c])
01:56:47  CMJ stderr _Z21compilationThreadProcPv+0x148 (0x0000FFFF9DD3C9B8 [libj9jit29.so+0x13c9b8])
01:56:47  CMJ stderr thread_wrapper+0xcc (0x0000FFFF9F2613BC [libj9thr29.so+0x73bc])
01:56:47  CMJ stderr start_thread+0x2d8 (0x0000FFFF9F4C96B8 [libc.so.6+0x806b8])
01:56:47  CMJ stderr thread_start+0xc (0x0000FFFF9F533BDC [libc.so.6+0xeabdc])
01:56:47  CMJ stderr ---------------------------------------
pshipton commented 1 week ago

@hzongaro pls take a look.

github-actions[bot] commented 1 week ago

Issue Number: 20257 Status: Open Recommended Components: comp:vm, comp:gc, comp:test Recommended Assignees: pshipton, fengxue-is, gacholio

hzongaro commented 1 week ago

@a7ehuo, may I ask you to take a look at this? I wasn't able to build KCA, and I'm having difficulty doing much with the core file that was produced, so I'm not sure whether this is happening on the AOT load of Object.hashCode or a compilation of it that was started after the AOT load:

05:56:46.425107386  0x1B700 j9jit.2              Event        (cold) Compiling java/lang/Object.hashCode()I
05:56:46.425237837  0x1B700 j9jit.45             Event       + (AOT load) java/lang/Object.hashCode()I @ 0xffff7b8077f8-0xffff7b807904/0x0-0xffff7b807904 time=191us 0x31368 0xffff797bf288 Q_SZ=4 bcsz=5 mem=0KB
05:56:46.501843072  0x1B700 omrport.256          Exception  * omrfile_open returns failure, filename = /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/jdkbinary/j2sdk-image/lib/java_en_US.properties, os errno = 2, portable errno = -108
05:56:46.501850134  0x1B700 omrport.256          Exception  * omrfile_open returns failure, filename = /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/jdkbinary/j2sdk-image/lib/java_en.properties, os errno = 2, portable errno = -108
05:56:46.504066334  0x1B700 j9dmp.9              Event       Preparing for dump, filename=/home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_Release_testList_1/aqa-tests/TKG/output_17275821255677/PatModImg_PlatMod_0/20240929-055635-PatModImgTest/results/core.20240929.055646.3880611.0001.dmp

One thing I can see is that it looks like the crash is happening on an inlined call to getClassDepth from jitMethodTranslated. It looks like it might be trying to get a subclass from java/lang/Object.

(gdb) info line *0x0000FFFF9E4E0A8C
Line 329 of "/home/jenkins/workspace/Build_JDK11_aarch64_linux_Release/openj9/runtime/oti/VMHelpers.hpp"
   starts at address 0xffff9e4e0a8c <jitMethodTranslated(J9VMThread*, J9Method*, void*)+140>
   and ends at 0xffff9e4e0a90 <jitMethodTranslated(J9VMThread*, J9Method*, void*)+144>.
(gdb) x/8i 0x0000FFFF9E4E0A8C
   0xffff9e4e0a8c:      ldrh    w0, [x1, #24]     <<< Crash happens here with x1=0
   0xffff9e4e0a90:      cmp     x5, x0
   0xffff9e4e0a94:      b.cs    0xffff9e4e0a24  // b.hs, b.nlast
a7ehuo commented 1 week ago

I also cannot get the backtrace from the core. So far I got more or less what Henry has found.

(gdb) bt 
#0  0x0000ffff9f4cb308 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

I believe it crashes in the second call to VM_VMHelpers::getClassDepth in jitMethodTranslated during the do-while loop condition check because it tries to dereference classDepthAndFlags from a NULL currentClass.

https://github.com/eclipse-openj9/openj9/blob/b279a6d49ff88c01eaf98aa317adab11c93bd3d5/runtime/codert_vm/CodertVMHelpers.cpp#L230

   0x0000ffff9e4e0a70 <+112>:   tbz w0, #9, 0xffff9e4e0a78 <jitMethodTranslated(J9VMThread*, J9Method*, void*)+120> // CodertVMHelpers.cpp:205
   0x0000ffff9e4e0a74 <+116>:   ldr x1, [x22, #944] // CodertVMHelpers.cpp:206
   0x0000ffff9e4e0a78 <+120>:   ldurh   w2, [x20, #-2]
   0x0000ffff9e4e0a7c <+124>:   ldrh    w5, [x1, #24] // VM_VMHelpers::getClassDepth
   0x0000ffff9e4e0a80 <+128>:   add x20, x20, x2 // CodertVMHelpers.cpp:209: jitToJitStartAddress
   0x0000ffff9e4e0a84 <+132>:   b   0xffff9e4e0a98 <jitMethodTranslated(J9VMThread*, J9Method*, void*)+152> // CodertVMHelpers.cpp:209: jitToJitStartAddress 
   0x0000ffff9e4e0a88 <+136>:   ldr x1, [x1, #176] // CodertVMHelpers.cpp:229: currentClass = currentClass->subclassTraversalLink; 
   0x0000ffff9e4e0a8c <+140>:   ldrh    w0, [x1, #24] //<--- crash VM_VMHelpers::getClassDepth
   0x0000ffff9e4e0a90 <+144>:   cmp x5, x0 // CodertVMHelpers.cpp:230: while (VM_VMHelpers::getClassDepth(currentClass) > initialClassDepth) 

x1 is currentClass: R1=0000000000000000 x19 is method: R19=0000000000031368. method is java/lang/Object.hashCode()I. The currentClass starts as 0x31100 (J9_CLASS_FROM_METHOD(method)). It then iterates through currentClass = currentClass->subclassTraversalLink. I calculated currentClass for the first few iterations and subclassTraversalLink is not NULL so far.

> !j9method 0x31368
J9Method at 0x31368 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x0000FFFF412592AC // "*�"
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000000031490 (flags = 0x4)
    0x10: void* methodRunAddress = !j9x 0x0000000000000018
    0x18: volatile void* extra = !j9x 0x0000FFFF7B8077F8
}
Signature: java/lang/Object.hashCode()I !bytecodes 0x0000000000031368
ROM Method: !j9rommethod 0x0000FFFF41259298
Next Method: !j9method 0x0000000000031388

> !j9constantpool 0x0000000000031490 
J9ConstantPool at 0x31490 {
  Fields for J9ConstantPool:
    0x0: struct J9Class* ramClass = !j9class 0x0000000000031100 // java/lang/Object
    0x8: struct J9ROMConstantPoolItem* romConstantPool = !j9romconstantpoolitem 0x0000FFFF412590B8
}
> !j9class 0x0000000000031100
J9Class at 0x31100 {
  Fields for J9Class:
    0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
    0x8: struct J9ROMClass* romClass = !j9romclass 0x0000FFFF41259000
    0x10: void** superclasses = !j9x 0x0000000000031608
    0x18: UDATA classDepthAndFlags = 0x00000000001E0000 (1966080)
   ...
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x00000000000CF900 // jdk/internal/loader/URLClassPath
   ...
}
Class name: java/lang/Object
To view static fields, use !j9statics 0x0000000000031100
To view instance shape, use !j9classshape 0x0000000000031100
> !j9class 0x00000000000CF900 
J9Class at 0xcf900 {
  Fields for J9Class:
    0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
    0x8: struct J9ROMClass* romClass = !j9romclass 0x0000FFFF41347A10
    0x10: void** superclasses = !j9x 0x00000000000CF818
    0x18: UDATA classDepthAndFlags = 0x00000000020E0001 (34471937)
   ...
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x00000000000CCF00 // java/lang/ClassLoader$LazyInitLock
   ...
> !j9class 0x00000000000CCF00
J9Class at 0xccf00 {
  Fields for J9Class:
    0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
    0x8: struct J9ROMClass* romClass = !j9romclass 0x0000FFFF41346AD8
    0x10: void** superclasses = !j9x 0x00000000000CCC88
    0x18: UDATA classDepthAndFlags = 0x00000000000E0001 (917505)
...
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x00000000000CCD00 // [Ljava/util/Hashtable$Entry;
...
a7ehuo commented 1 week ago

Grinder (ID 43741) run 150x PatModImg_PlatMod_0/PatModImg_PlatMod_1: All passed

pshipton commented 1 week ago

FYI the original failure was found on the 0.48 build.

a7ehuo commented 1 week ago

0.48 build Grinder (ID 43777) run 150x PatModImg_PlatMod_0/PatModImg_PlatMod_1: All passed

Eclipse OpenJ9 VM (build v0.48.0-release-6b84dcc9064, JRE 11 Linux aarch64-64-Bit Compressed References 20241002_1120 (JIT enabled, AOT enabled)
OpenJ9   - 6b84dcc9064
OMR      - cfe6789bfa8
JCL      - 9d684ea755d based on jdk-11.0.25+7)
pshipton commented 1 week ago

I'll remove the milestone.

vij-singh commented 1 week ago

Should we assign to 0.49 until we are sure it's no longer an issue?

pshipton commented 1 week ago

If it's seen enough I will add it back to a milestone. Until it's seen again it doesn't seem there is anything to do.

pshipton commented 6 days ago

Another one, different platform. I'll put it back in 0.48 in case we can progress it from this.

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_0/216/ - cent9-ppc64le-6 jdk_lang_0 java/lang/invoke/MethodHandleProxies/MethodHandlesProxiesTest.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_0/216/openjdk_test_output.tar.gz

18:20:36  Unhandled exception
18:20:36  Type=Segmentation error vmState=0x0005ffff
18:20:36  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
18:20:36  Handler1=00007FFF89242630 Handler2=00007FFF89568820
18:20:36  R0=00007FFF8315F68C R1=00007FFF8810A540 R2=00007FFF83FAEA00 R3=0000000000000000
18:20:36  R4=00000000000ADD70 R5=00007FFF6906268C R6=0000000000000000 R7=00000000000ADAC0
18:20:36  R8=00000000000ADCB8 R9=0000000000000000 R10=0000000000000001 R11=00007FFF83FD6790
18:20:36  R12=00007FFF832AC880 R13=00007FFF881168E0 R14=00007FFF89BFD268 R15=0000000000110000
18:20:36  R16=00007FFF8810F160 R17=00007FFF83FD6158 R18=0000000000000001 R19=00007FFF83FD7810
18:20:36  R20=00007FFF8810A9C0 R21=0000000000000000 R22=00007FFF840F5710 R23=0000000000000000
18:20:36  R24=00007FFF83FD6158 R25=0000000000097D00 R26=00007FFF840EF380 R27=00007FFF8429C4F0
18:20:36  R28=00007FFF84035430 R29=0000000000097D00 R30=00007FFF69062688 R31=00000000000CA020
18:20:36  NIP=00007FFF83C702B4 MSR=800000000280D033 ORIG_GPR3=00007FFF83C702C8 CTR=0000000000000000
18:20:36  LINK=00007FFF83C70254 XER=0000000020040000 CCR=0000000026888402 SOFTE=0000000000000001
18:20:36  TRAP=0000000000000300 DAR=0000000000000018 dsisr=0000000040000000 RESULT=0000000000000000
18:20:36  FPR0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR1=3fe8000000000000 (f: 0.000000, d: 7.500000e-01)
18:20:36  FPR2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR3=0000002e00000021 (f: 33.000000, d: 9.761181e-313)
18:20:36  FPR4=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR7=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR9=0000000500000004 (f: 4.000000, d: 1.060998e-313)
18:20:36  FPR10=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR11=41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
18:20:36  FPR12=3f50624dd2f1a9fc (f: 3539053056.000000, d: 1.000000e-03)
18:20:36  FPR13=3980002839ce0028 (f: 969801792.000000, d: 9.861140e-32)
18:20:36  FPR14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR16=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR17=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR18=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR19=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR20=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR21=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR22=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR23=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR24=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR25=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR26=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR27=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR28=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR29=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR30=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  FPR31=0000000000000000 (f: 0.000000, d: 0.000000e+00)
18:20:36  Module=/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64le_linux_Nightly_testList_0/jdkbinary/j2sdk-image/lib/default/libj9jit29.so
18:20:36  Module_base_address=00007FFF83000000
18:20:36  
18:20:36  Method_being_compiled=java/util/Collection.stream()Ljava/util/stream/Stream;
18:20:36  Target=2_90_20241003_237 (Linux 5.14.0-503.el9.ppc64le)
18:20:36  CPU=ppc64le (4 logical CPUs) (0x1dade0000 RAM)
18:20:36  ----------- Stack Backtrace -----------
18:20:36  jitMethodTranslated+0xb4 (0x00007FFF83C702B4 [libj9jit29.so+0xc702b4])
18:20:36  _ZN2TR15CompilationInfo14compilationEndEP10J9VMThreadRNS_24IlGeneratorMethodDetailsEP11J9JITConfigPvS7_P11TR_FrontEndP21TR_MethodToBeCompiledPNS_11CompilationE+0x38c (0x00007FFF8315F68C [libj9jit29.so+0x15f68c])
18:20:36  _ZN2TR28CompilationInfoPerThreadBase20postCompilationTasksEP10J9VMThreadP21TR_MethodToBeCompiledP8J9MethodPKvP19J9JITExceptionTablebbP20TR_RelocationRuntime+0x360 (0x00007FFF83171FB0 [libj9jit29.so+0x171fb0])
18:20:36  _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x3b0 (0x00007FFF83174430 [libj9jit29.so+0x174430])
18:20:36  _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x170 (0x00007FFF831749F0 [libj9jit29.so+0x1749f0])
18:20:36  _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x410 (0x00007FFF831731E0 [libj9jit29.so+0x1731e0])
18:20:36  _ZN2TR24CompilationInfoPerThread3runEv+0xa8 (0x00007FFF831737F8 [libj9jit29.so+0x1737f8])
18:20:36  _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0xa0 (0x00007FFF831738B0 [libj9jit29.so+0x1738b0])
18:20:36  omrsig_protect+0x3e4 (0x00007FFF89569CD4 [libj9prt29.so+0x39cd4])
18:20:36  _Z21compilationThreadProcPv+0x1a8 (0x00007FFF83173E18 [libj9jit29.so+0x173e18])
18:20:36  thread_wrapper+0x190 (0x00007FFF8999CBC0 [libj9thr29.so+0xcbc0])
18:20:36  start_thread+0x170 (0x00007FFF89CA2C30 [libc.so.6+0xa2c30])
18:20:36  clone+0xa0 (0x00007FFF89D4C6C8 [libc.so.6+0x14c6c8])
18:20:36  ---------------------------------------
a7ehuo commented 3 days ago

I looked at the core from ppc64le. There is no backtrace [1].

This crash looks same as the previous one. It crashes because it dereferences a NULL currentClass (R9=0000000000000000) that is retrieved from subclassTraversalLink [2]. https://github.com/eclipse-openj9/openj9/blob/1153568b78f6fa2ebbb7c929259172c0813c5bba/runtime/codert_vm/CodertVMHelpers.cpp#L229-L230

Same as previous crash, jitMethodTranslated is called at the end of an AOT load method. However this time, it is a different method java/util/Collection.stream:

+ (AOT load) java/util/Collection.stream()Ljava/util/stream/Stream; @ 0x7fff69062688-0x7fff6906284c/0x0-0x7fff6906284c time=21us 0xca020 0x7fff62913e78 Q_SZ=91 bcsz=11 mem=0KB
(gdb) list *0x00007FFF83C702B4
0x7fff83c702b4 is in jitMethodTranslated(J9VMThread*, J9Method*, void*) (/home/jenkins/workspace/Build_JDK21_ppc64le_linux_Nightly/openj9/runtime/oti/VMHelpers.hpp:329).
324 in /home/jenkins/workspace/Build_JDK21_ppc64le_linux_Nightly/openj9/runtime/oti/VMHelpers.hpp

(gdb) list *0x00007FFF8315F68C
0x7fff8315f68c is in TR::CompilationInfo::compilationEnd(J9VMThread*, TR::IlGeneratorMethodDetails&, J9JITConfig*, void*, void*, TR_FrontEnd*, TR_MethodToBeCompiled*, TR::Compilation*) (/home/jenkins/workspace/Build_JDK21_ppc64le_linux_Nightly/openj9/runtime/compiler/control/CompilationThread.cpp:10586).
warning: Source file is more recent than executable.
10581                     entry->_compInfoPT->setMetadata(NULL);
10582                  startPC = oldStartPC;
10583                  }
10584               else
10585                  {
10586                  jitMethodTranslated(vmThread, method, startPC);
10587   #if defined(J9VM_OPT_CRIU_SUPPORT)
10588                  if (jitConfig->javaVM->internalVMFunctions->isCheckpointAllowed(jitConfig->javaVM)
...

Based on the logic inside jitMethodTranslated, I calculated the first six iterations of the loop. subclassTraversalLink is all good so far. I suspect either classDepthAndFlags or subclassTraversalLink or both are not set up correctly somewhere along the class chain and it continues to iterates through the subclasses when it is not supposed to.

With my limited knowledge, I don't think the JIT modifies subclassTraversalLink or classDepthAndFlags of a J9Class. @TobiAjila I'm wondering if the VM team could help take a look at it further. Or if there is anything I could check to further the investigation. It'd be much appreciated. Thank you!

iteration 0 [3]

currentClass = 0x000adb00 // java/lang/Object
initialClassDepth = 0

currentClass initially is 0x00000000000CA100 (java/util/Collection). Since java/util/Collection is an interface, currentClass is reassigned as 0x000adb00 (java/lang/Object)

iteration 1 [4] currentClass = 0x0000000000243F00 // sun/nio/fs/AbstractBasicFileAttributeView

iteration 2 [5] currentClass = 0x0000000000244600 // sun/nio/fs/UnixFileAttribjitStartAddressuteViews$Basic

iteration 3 [6] currentClass = 0x0000000000244600 // sun/nio/fs/UnixFileAttributeViews$Basic

iteration 4 [7] currentClass = 0x0000000000243600 // sun/nio/fs/UnixFileAttributeViews

iteration 5 [8] currentClass = 0x0000000000243200 // java/nio/file/attribute/UserDefinedFileAttributeView

[1]

(gdb) bt
#0  0x00007fff89ca5668 in ?? ()
#1  0x0000000000000000 in ?? ()

[2]

   0x00007fff83c70278 <+120>:   beq     0x7fff83c70350 <jitMethodTranslated(J9VMThread*, J9Method*, void*)+336>
   0x00007fff83c7027c <+124>:   ld      r9,8(r31)   // 204: currentClass = J9_CLASS_FROM_METHOD(method);
   0x00007fff83c70280 <+128>:   rldicr  r9,r9,0,59  // 204: currentClass = J9_CLASS_FROM_METHOD(method)
   0x00007fff83c70284 <+132>:   ld      r9,0(r9)    // 204: currentClass = J9_CLASS_FROM_METHOD(method)
   0x00007fff83c70288 <+136>:   ld      r10,8(r9)   // 205: if (J9ROMCLASS_IS_INTERFACE(currentClass->romClass))
   0x00007fff83c7028c <+140>:   lwz     r10,16(r10) // 205: if (J9ROMCLASS_IS_INTERFACE(currentClass->romClass))
   0x00007fff83c70290 <+144>:   andi.   r10,r10,512 // 205: if (J9ROMCLASS_IS_INTERFACE(currentClass->romClass))
   0x00007fff83c70294 <+148>:   beq     0x7fff83c7029c <jitMethodTranslated(J9VMThread*, J9Method*, void*)+156> // 205: if (J9ROMCLASS_IS_INTERFACE(currentClass->romClass))
   0x00007fff83c70298 <+152>:   ld      r9,944(r28) // 206: currentClass = J9VMJAVALANGOBJECT_OR_NULL(vm);
   0x00007fff83c7029c <+156>:   lhz     r10,-2(r30) // getClassDepth
   0x00007fff83c702a0 <+160>:   lhz     r3,24(r9)   // getClassDepth
   0x00007fff83c702a4 <+164>:   add     r5,r30,r10  // jitToJitStartAddress
   0x00007fff83c702a8 <+168>:   b       0x7fff83c702c0 <jitMethodTranslated(J9VMThread*, J9Method*, void*)+192> // jitToJitStartAddress
   0x00007fff83c702ac <+172>:   ori     r2,r2,0     // jitToJitStartAddress
   0x00007fff83c702b0 <+176>:   ld      r9,176(r9)  // 229: currentClass = currentClass->subclassTraversalLink;
   0x00007fff83c702b4 <+180>:   lhz     r10,24(r9)  //<-- crash
   0x00007fff83c702b8 <+184>:   cmpld   r3,r10

[3]

> !j9method 0x00000000000CA020 |grep constantPool
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x00000000000CA2A0 (flags = 0x4)
> !j9constantpool 0x00000000000CA2A0 | grep ramClass
    0x0: struct J9Class* ramClass = !j9class 0x00000000000CA100 // java/util/Collection

> !j9class 0x00000000000CA100 | grep romClass
    0x8: struct J9ROMClass* romClass = !j9romclass 0x00007FFF312948F8

> !j9romclass 0x00007FFF312948F8 | grep modifiers
    0x10: U32 modifiers = 0x00000601 (1537)

currentClass = J9VMJAVALANGOBJECT_OR_NULL(vm);

!j9javavm 0x00007FFF84035430

0x00007FFF84035430 + 944 

(gdb) x 0x7FFF840357E0
0x7fff840357e0: 0x000adb00

currentClass = 0x000adb00

> !j9class 0x000adb00 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000001E0000 (1966080)

[4]

> !j9class 0x000adb00 | grep subclassTraversalLink
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000000243F00 // sun/nio/fs/AbstractBasicFileAttributeView

> !j9class 0x0000000000243F00 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000001E0001 (1966081)

[5]

> !j9class 0x0000000000243F00 | grep subclassTraversalLink
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000000244600 // sun/nio/fs/UnixFileAttributeViews$Basic

> !j9class 0x0000000000244600 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000020E0002 (34471938)

[6]

> !j9class 0x0000000000244600  | grep subclassTraversalLink
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000000243900 // sun/nio/fs/DynamicFileAttributeView

> !j9class 0x0000000000243900 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000001E0001 (1966081)

[7]

> !j9class 0x0000000000243900 | grep subclassTraversalLink
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000000243600 // sun/nio/fs/UnixFileAttributeViews

> !j9class 0x0000000000243600 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000000E0001 (917505)

[8]

> !j9class 0x0000000000243600  | grep subclassTraversalLink
    0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000000243200 // java/nio/file/attribute/UserDefinedFileAttributeView

>  !j9class 0x0000000000243200 | grep classDepthAndFlags
    0x18: UDATA classDepthAndFlags = 0x00000000000E0001 (917505)
tajila commented 3 days ago

The only thing that comes to mind is there could be an issue if a class was being loaded while that method is called.

FYI @gacholio

gacholio commented 2 days ago

If the thread does not have exclusive, there certainly could be new classes loaded during the loop.

a7ehuo commented 2 days ago

If the thread does not have exclusive ...

Just trying to understand, do you mean something like acquireVMAccessNoSuspend before calling jitMethodTranslated?

inline void acquireVMAccessNoSuspend(J9VMThread *vmThread)
   {
   vmThread->javaVM->internalVMFunctions->internalAcquireVMAccessWithMask(vmThread, J9_PUBLIC_FLAGS_HALT_THREAD_ANY_NO_JAVA_SUSPEND);
   }
gacholio commented 2 days ago

We may need to add a mutex for the traversal link uses. Exclusive and VM access act like a mutex, but it would be pretty heavy-handed to use that in this case.

gacholio commented 2 days ago

Looks like updates are protected with classTableMutex so the readers should use that as well.

a7ehuo commented 2 days ago

jitMethodTranslated is called in two places in TR::CompilationInfo::compilationEnd. Its implementation and where they are called have not been changed for a few years.

@mpirvu I'm wondering if you see any concerns of calling classTableMutex inside jitMethodTranslated at the end of compilation?

(1) https://github.com/eclipse-openj9/openj9/blob/8357d17869fedf6b29018582bd88fe7b0c63ad31/runtime/compiler/control/CompilationThread.cpp#L10468 (2) https://github.com/eclipse-openj9/openj9/blob/8357d17869fedf6b29018582bd88fe7b0c63ad31/runtime/compiler/control/CompilationThread.cpp#L10586

a7ehuo commented 2 days ago

Marius is on vacation. Please ignore my last comment.

@gacholio I have a change that wraps the loop inside jitMethodTranslated with classTableMutex. https://github.com/eclipse-openj9/openj9/compare/master...a7ehuo:openj9:acquire-classTableMutex Does it look correct? I don't think the code before the loop needs acquiring the mutex.

I looked at how other parts of VM traverse subclass link such as in verifyJ9ClassSubclassHierarchy or initializeClassLinks. It doesn't look obvious to me that they acquire classTableMutex. Another question is why it would be required in jitMethodTranslated?

gacholio commented 1 day ago

It should probably be acquired in all of the reader cases you've found.

a7ehuo commented 1 day ago

After taking a further look at verifyJ9ClassSubclassHierarchy and initializeClassLinks. I think these two should be fine that they don't need to acquire classTableMutex. The change in jitMethodTranslated is all needed for now and I'll run some regression test on the change.

(1) verifyJ9ClassSubclassHierarchy

verifyJ9ClassSubclassHierarchy is triggered through runAllVMChecks which states "Caller must either have exclusive VM access or must be executing on behalf of a thread that does"

https://github.com/eclipse-openj9/openj9/blob/70f99fd17d4b349f0bca400c1ec934e40302c144/runtime/vmchk/vmcheck.c#L188-L200

(2) initializeClassLinks

initializeClassLinks is called inside internalCreateRAMClassDone which already acquires classTableMutex. And there is another version of no mutex internalCreateRAMClassDoneNoMutex.

https://github.com/eclipse-openj9/openj9/blob/70f99fd17d4b349f0bca400c1ec934e40302c144/runtime/vm/createramclass.cpp#L2582-L2583

gacholio commented 1 day ago

Your change looks correct to me.

github-actions[bot] commented 8 hours ago

Issue Number: 20257 Status: Closed Actual Components: comp:jit, test failure, segfault Actual Assignees: No one :( PR Assignees: No one :(

github-actions[bot] commented 8 hours ago

Issue Number: 20257 Status: Closed Actual Components: comp:jit, test failure, segfault Actual Assignees: No one :( PR Assignees: No one :(

pshipton commented 5 hours ago

Re the failure https://github.com/eclipse-openj9/openj9/issues/20257#issuecomment-2393867132 100x grinder across 5 machines passed.