eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

aarch64 mac segfault in j9gc_createJavaLangString #15522

Open pshipton opened 2 years ago

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/96 SharedClasses.SCM23.MultiCL_0 -Xjit -Xgcpolicy:gencon -Xnocompressedrefs

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/96/system_test_output.tar.gz

MCL5 10:15:25 >> Loaded 19000 classes...
STF 10:15:27.320 - Found dump at: /Users/jenkins/workspace/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16575830392423/SharedClasses.SCM23.MultiCL_0/20220712-100821-SharedClasses/results/core.19700101.000000.45317.0001.dmp
MCL1 10:15:27 >> Loaded 20000 classes...
MCL1 10:15:27 >> Total classes loaded = 20001
MCL1 stderr Unhandled exception
MCL1 stderr Type=Segmentation error vmState=0x00000000
MCL1 stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
MCL1 stderr Handler1=0000000104EF22BC Handler2=0000000105222E14 InaccessibleAddress=0000000000006348
MCL1 stderr x0=000000012F809300 x1=0000000109F37A41 x2=0000000000000006 x3=0000000000000000
MCL1 stderr x4=000000016B6CBE90 x5=000000016B6CBED8 x6=000000016B6CBED0 x7=000000016B6CBEC8
MCL1 stderr x8=0000000000000000 x9=0000000104E9EDE8 x10=000000012F818B88 x11=000000012F818BB2
MCL1 stderr x12=000000016B6CBEE8 x13=000000016B6CBEC0 x14=000000037F7D6E4C x15=000000016B6CC2A0
MCL1 stderr x16=0000000188D6BF60 x17=00000001F77A6F28 x18=000000037F7D6DC0 x19=000000012F809300
MCL1 stderr x20=0000000000000000 x21=0000000109F37A41 x22=0000000109F37A41 x23=0000000000000006
MCL1 stderr x24=000000010501630C x25=000000010A5BA2A8 x26=000000016B6CC210 x27=000000012F813E20
MCL1 stderr x28=000000012F824A68 x29(FP)=000000016B6CBE80 x30(LR)=0000000104EE9680 x31(SP)=000000016B6CBDD0
MCL1 stderr PC=000000010A5BA2E8 SP=000000016B6CBDD0
MCL1 stderr v0 07ffffffffffffff (f: 4294967296.000000, d: 3.785767e-270)
MCL1 stderr v1 0000000000000007 (f: 7.000000, d: 3.458460e-323)
MCL1 stderr v2 0706050403020100 (f: 50462976.000000, d: 7.949929e-275)
MCL1 stderr v3 3fd686c85e9b14cf (f: 1587221760.000000, d: 3.519765e-01)
MCL1 stderr v4 0000000100000000 (f: 0.000000, d: 2.121996e-314)
MCL1 stderr v5 0000000100000000 (f: 0.000000, d: 2.121996e-314)
MCL1 stderr v6 0000000000000001 (f: 1.000000, d: 4.940656e-324)
MCL1 stderr v7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v16 bfd0000000000000 (f: 0.000000, d: -2.500000e-01)
MCL1 stderr v17 3fd56906a0555555 (f: 2689946880.000000, d: 3.345353e-01)
MCL1 stderr v18 bf73bd735a15802f (f: 1511358464.000000, d: -4.819346e-03)
MCL1 stderr v19 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
MCL1 stderr v20 00000000ffffffff (f: 4294967296.000000, d: 2.121996e-314)
MCL1 stderr v21 00000000ffffffff (f: 4294967296.000000, d: 2.121996e-314)
MCL1 stderr v22 00000000ffffffff (f: 4294967296.000000, d: 2.121996e-314)
MCL1 stderr v23 ffffffffffffffff (f: 4294967296.000000, d: nan)
MCL1 stderr v24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr v31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
MCL1 stderr Module=/Users/jenkins/workspace/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9gc29.dylib
MCL1 stderr Module_base_address=000000010A4F8000 Symbol=j9gc_createJavaLangString
MCL1 stderr Symbol_address=000000010A5BA2A8
MCL1 stderr Target=2_90_20220712_99 (Mac OS X 11.5.2)
MCL1 stderr CPU=aarch64 (8 logical CPUs) (0x400000000 RAM)
MCL1 stderr ----------- Stack Backtrace -----------
MCL1 stderr ---------------------------------------
...
More segfaults follow

@knn-k @dmitripivkine fyi

dmitripivkine commented 2 years ago

There is observation GC Check returns thousands of errors class pointer not in a class segment. There are many j9 classes look correct (and not unloaded) but have not returned by !allclasses There are examples of such classes:

java/lang/invoke/DirectMethodHandle
jdk/internal/misc/InnocuousThread
java/lang/Thread
java/lang/Class
jdk/internal/loader/ClassLoaders$PlatformClassLoader
...

I am not sure is it tool reporting issue or real problem. I am still looking to it.

j9gc_createJavaLangString is more VM than GC code really, adding comp:vm label as well. @tajila FYI

dmitripivkine commented 2 years ago

please note core in artifacts is generated for:

> !gpinfo
Failing Thread: !j9vmthread 0x10f809d00
Failing Thread ID: 0xa019a34 (167877172)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=0000000104EF22BC Handler2=0000000105222E14 InaccessibleAddress=0000000000006798
x0=000000010F809D00 x1=000000010F809D00 x2=000000037FD75A90 x3=0000000000000001
x4=000000010A5A4204 x5=0000000000000000 x6=0000000000000009 x7=00000000000002E0
x8=0000000000000020 x9=0000000000000000 x10=0000000110FD3900 x11=00000000021E0001
x12=00000000021E0002 x13=0000000110F64000 x14=0000000110FD3710 x15=0000000000000009
x16=0000000188D1E2A0 x17=0000000000000023 x18=0000000000000000 x19=0000000000000006
x20=00000001382061A0 x21=0000000118000000 x22=000000037FD75A90 x23=0000000000000018
x24=0000000000000001 x25=000000012F813E20 x26=0000000000000020 x27=0000000109E7E364
x28=000000016B2485D8 x29(FP)=000000016B24D8E0 x30(LR)=000000010989BB98 x31(SP)=000000016B247770
PC=000000010A5A421C SP=000000016B247770
v0 00000000000000ff (f: 255.000000, d: 1.259867e-321)
v1 ffffffffffffffff (f: 4294967296.000000, d: nan)
v2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v3 0706050403020100 (f: 50462976.000000, d: 7.949929e-275)
v4 00000000ffffffff (f: 4294967296.000000, d: 2.121996e-314)
v5 0000000000000002 (f: 2.000000, d: 9.881313e-324)
v6 0000080000000800 (f: 2048.000000, d: 4.345847e-311)
v7 000000000000000d (f: 13.000000, d: 6.422853e-323)
v8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v16 0000000000000001 (f: 1.000000, d: 4.940656e-324)
v17 0000000000000001 (f: 1.000000, d: 4.940656e-324)
v18 0000000000000001 (f: 1.000000, d: 4.940656e-324)
v19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v21 ffffffffffffffff (f: 4294967296.000000, d: nan)
v22 ffffffffffffffff (f: 4294967296.000000, d: nan)
v23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
v31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/Users/jenkins/workspace/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9gc29.dylib
Module_base_address=000000010A4F8000 Symbol=j9gc_objaccess_mixedObjectReadObject <-------
Symbol_address=000000010A5A4204

Method_being_compiled=net/openj9/sc/classes/TestClass_9668.stringOperations(III)V <-------
dmitripivkine commented 2 years ago

The !j9object 0x37FD75A90 mentioned in registers is java/lang/invoke/DirectMethodHandle

x2=000000037FD75A90

> !j9object 0x37FD75A90
!J9Object 0x000000037FD75A90 {
    struct J9Class* clazz = !j9class 0x110FD3900 // java/lang/invoke/DirectMethodHandle
    Object flags = 0x00000020;
    Ljava/lang/invoke/MethodType; type = !fj9object 0x280308578 (offset = 0) (java/lang/invoke/MethodHandle)
    Ljava/lang/invoke/LambdaForm; form = !fj9object 0x280305cc0 (offset = 8) (java/lang/invoke/MethodHandle)
    Ljava/lang/invoke/MethodHandle; asTypeCache = !fj9object 0x0 (offset = 16) (java/lang/invoke/MethodHandle)
    B customizationCount = 0x00000000 (offset = 32) (java/lang/invoke/MethodHandle)
    Z updateInProgress = 0x00000000 (offset = 36) (java/lang/invoke/MethodHandle)
    Ljava/lang/invoke/MemberName; jitVMEntryKeepAlive = !fj9object 0x0 (offset = 24) (java/lang/invoke/MethodHandle) <hidden>
    Ljava/lang/invoke/MemberName; member = !fj9object 0x2803085b0 (offset = 48) (java/lang/invoke/DirectMethodHandle)
    Z crackable = 0x00000000 (offset = 56) (java/lang/invoke/DirectMethodHandle)
    J lockword = 0x0000000000000000 (offset = 40) (java/lang/invoke/DirectMethodHandle) <hidden>
}
dmitripivkine commented 2 years ago

Looking to Snap traces: looks like JVM was in Shutdown mode deeply when recorded crash occur. I am not sure was JVM shutdown triggered by another crash occur earlier or the reason for crashes is partial shutdown. For example Garbage Collector has it structures teared down partially

knn-k commented 2 years ago

This is a failure with SharedClasses.SCM23.MultiCL_0. Let's see if it happens or not after PR #15907 is applied.