eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 722 forks source link

GPF with Semeru Runtime Open Edition 21.0.2.0-m2 in Jit while compiling sun/security/util/DisabledAlgorithmConstraints$Constraints.permits #18748

Closed avermeer closed 5 months ago

avermeer commented 10 months ago

Java -version output

openjdk version "21.0.1" 2023-10-17 LTS IBM Semeru Runtime Open Edition 21.0.2.0-m2 (build 21.0.1+12-LTS) Eclipse OpenJ9 VM 21.0.2.0-m2 (build v0.43.0-release-54becab46, JRE 21 Linux amd64-64-Bit Compressed References 20231017_57 (JIT enabled, AOT enabled) OpenJ9 - 54becab46 OMR - d810fcb98 JCL - 10a4ed68a77 based on jdk-21.0.1+12)

Summary of problem

We have a GPF crash in our preproduction web application (using on top of TomEE 9.1.2 on Redhat Linux 8.7 on Linux x64) with Semeru Runtime Open Edition 21.0.2.0-m2. While we don't know the exact scenario to reproduce, the GPF crash took place on the two different VMs running our web application last night, and the night before, with same type of dumps.

The issue seems to take place when JIT compiler is trying compile sun/security/util/DisabledAlgorithmConstraints$Constraints.permits, given this output which we found in catalina.out:

#0: /usr/local/jdk/lib/default/libj9jit29.so(+0x918b05) [0x7fd27dfe8b05]
#1: /usr/local/jdk/lib/default/libj9jit29.so(+0x924ec0) [0x7fd27dff4ec0]
#2: /usr/local/jdk/lib/default/libj9jit29.so(+0x1606c9) [0x7fd27d8306c9]
#3: /usr/local/jdk/lib/default/libj9prt29.so(+0x29948) [0x7fd290f87948]
#4: /lib64/libpthread.so.0(+0x12cf0) [0x7fd290a9fcf0]
#5: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d23b0) [0x7fd27dca23b0]
#6: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d3a83) [0x7fd27dca3a83]
#7: /usr/local/jdk/lib/default/libj9jit29.so(+0x770bd6) [0x7fd27de40bd6]
#8: /usr/local/jdk/lib/default/libj9jit29.so(+0x770f4a) [0x7fd27de40f4a]
#9: /usr/local/jdk/lib/default/libj9jit29.so(+0x77118a) [0x7fd27de4118a]
#10: /usr/local/jdk/lib/default/libj9jit29.so(+0x799ba5) [0x7fd27de69ba5]
#11: /usr/local/jdk/lib/default/libj9jit29.so(+0x79b893) [0x7fd27de6b893]
#12: /usr/local/jdk/lib/default/libj9jit29.so(+0x5a64b3) [0x7fd27dc764b3]
#13: /usr/local/jdk/lib/default/libj9jit29.so(+0x58b22c) [0x7fd27dc5b22c]
#14: /usr/local/jdk/lib/default/libj9jit29.so(+0x176a2f) [0x7fd27d846a2f]
#15: /usr/local/jdk/lib/default/libj9jit29.so(+0x177a61) [0x7fd27d847a61]
#16: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7fd290f88399]
#17: /usr/local/jdk/lib/default/libj9jit29.so(+0x1755a2) [0x7fd27d8455a2]
#18: /usr/local/jdk/lib/default/libj9jit29.so(+0x1758e8) [0x7fd27d8458e8]
#19: /usr/local/jdk/lib/default/libj9jit29.so(+0x17480b) [0x7fd27d84480b]
#20: /usr/local/jdk/lib/default/libj9jit29.so(+0x174b72) [0x7fd27d844b72]
#21: /usr/local/jdk/lib/default/libj9jit29.so(+0x174c22) [0x7fd27d844c22]
#22: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7fd290f88399]
#23: /usr/local/jdk/lib/default/libj9jit29.so(+0x174feb) [0x7fd27d844feb]
#24: /usr/local/jdk/lib/default/libj9thr29.so(+0xb3a3) [0x7fd290f503a3]
#25: /lib64/libpthread.so.0(+0x81cf) [0x7fd290a951cf]
#26: /lib64/libc.so.6(clone+0x43) [0x7fd2904fce73]
Unhandled exception
Type=Segmentation error vmState=0x00053cff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FD28B8367D0 Handler2=00007FD290F87710 InaccessibleAddress=0000000000000008
RDI=00007FD258861260 RSI=0000000000000000 RAX=0000000000000000 RBX=0000000000000000
RCX=00007FD258861600 RDX=0000000000000000 R8=0000000000000000 R9=00007FD27D8E27D0
R10=0000000000050700 R11=00007FD18B060C20 R12=0000000000000000 R13=00007FD258861220
R14=00000000CA4587E7 R15=00007FD27E1A94FC
RIP=00007FD27DCA23B0 GS=0000 FS=0000 RSP=00007FD2588611E8
EFlags=0000000000010206 CS=0033 RBP=00007FD258861220 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 00007fd258866780 (f: 1485203328.000000, d: 6.943668e-310)
xmm1 00007fd18b000000 (f: 2332033024.000000, d: 6.943498e-310)
xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm3 00007fd18b0216d0 (f: 2332169984.000000, d: 6.943498e-310)
xmm4 00007fd18b021820 (f: 2332170240.000000, d: 6.943498e-310)
xmm5 000001b4000001b4 (f: 436.000000, d: 9.251902e-312)
xmm6 00007fd18b021640 (f: 2332169728.000000, d: 6.943498e-310)
xmm7 00007fd18b021580 (f: 2332169728.000000, d: 6.943498e-310)
xmm8 00007fd18b021790 (f: 2332170240.000000, d: 6.943498e-310)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0305180f02190a43 (f: 35195460.000000, d: 4.128504e-294)
xmm11 0000015200000151 (f: 337.000000, d: 7.172346e-312)
xmm12 0000013d00000140 (f: 320.000000, d: 6.726727e-312)
xmm13 000001380000013f (f: 319.000000, d: 6.620627e-312)
xmm14 0000000008001800 (f: 134223872.000000, d: 6.631540e-316)
xmm15 000001420000013b (f: 315.000000, d: 6.832826e-312)
Module=/usr/local/jdk/lib/default/libj9jit29.so
Module_base_address=00007FD27D6D0000

Method_being_compiled=sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
Target=2_90_20231017_57 (Linux 4.18.0-425.19.2.el8_7.x86_64)
CPU=amd64 (8 logical CPUs) (0x7b026f000 RAM)
----------- Stack Backtrace -----------
11-Jan-2024 22:46:58.552 WARNING [http-nio-8082-exec-1] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=40775a4e-0d07-4ee8-8ede-449577fd7830; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.617 WARNING [http-nio-8082-exec-1] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=8251c270-ace7-45d8-a919-0d6792054a22; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.617 WARNING [http-nio-8082-exec-1] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=8e62975b-0bc1-465a-be5b-6f0baa587aa0; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.648 WARNING [http-nio-8082-exec-7] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=7f219feb-3669-4adc-bf32-1410debe8b77; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.671 WARNING [http-nio-8082-exec-7] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=2df756ec-edc3-425c-8fe8-724df8581f8c; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.730 WARNING [http-nio-8082-exec-7] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=f32610eb-8aaa-4c91-b3ba-1d00cacb30b9; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
11-Jan-2024 22:46:58.730 WARNING [http-nio-8082-exec-7] org.apache.http.client.protocol.ResponseProcessCookies.processCookies Invalid cookie header: "set-cookie: afs=dcaf7563-c544-4def-8d9c-97375a4d4b6d; Max-Age=311040000; Expires=Sat, 19 Nov 2033 22:46:58 GMT; Path=/; Secure; HttpOnly; SameSite=None; Secure". Invalid 'expires' attribute: Sat, 19 Nov 2033 22:46:58 GMT
_ZNK2TR13NodeChecklist8containsEPNS_4NodeE+0x0 (0x00007FD27DCA23B0 [libj9jit29.so+0x5d23b0])
_ZN2TR20PreorderNodeIterator11stepForwardEv+0x43 (0x00007FD27DCA3A83 [libj9jit29.so+0x5d3a83])
_ZN18TR_ColdBlockMarker11isBlockColdEPN2TR5BlockE+0xb6 (0x00007FD27DE40BD6 [libj9jit29.so+0x770bd6])
_ZN18TR_ColdBlockMarker18identifyColdBlocksEv+0x9a (0x00007FD27DE40F4A [libj9jit29.so+0x770f4a])
_ZN18TR_ColdBlockMarker7performEv+0x3a (0x00007FD27DE4118A [libj9jit29.so+0x77118a])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x855 (0x00007FD27DE69BA5 [libj9jit29.so+0x799ba5])
_ZN3OMR9Optimizer8optimizeEv+0x1b3 (0x00007FD27DE6B893 [libj9jit29.so+0x79b893])
_ZN3OMR20ResolvedMethodSymbol5genILEP11TR_FrontEndPN2TR11CompilationEPNS3_20SymbolReferenceTableERNS3_12IlGenRequestE+0x343 (0x00007FD27DC764B3 [libj9jit29.so+0x5a64b3])
_ZN3OMR11Compilation7compileEv+0x25c (0x00007FD27DC5B22C [libj9jit29.so+0x58b22c])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4bf (0x00007FD27D846A2F [libj9jit29.so+0x176a2f])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x381 (0x00007FD27D847A61 [libj9jit29.so+0x177a61])
omrsig_protect+0x239 (0x00007FD290F88399 [libj9prt29.so+0x2a399])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x372 (0x00007FD27D8455A2 [libj9jit29.so+0x1755a2])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x128 (0x00007FD27D8458E8 [libj9jit29.so+0x1758e8])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x35b (0x00007FD27D84480B [libj9jit29.so+0x17480b])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007FD27D844B72 [libj9jit29.so+0x174b72])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007FD27D844C22 [libj9jit29.so+0x174c22])
omrsig_protect+0x239 (0x00007FD290F88399 [libj9prt29.so+0x2a399])
_Z21compilationThreadProcPv+0x17b (0x00007FD27D844FEB [libj9jit29.so+0x174feb])
thread_wrapper+0x163 (0x00007FD290F503A3 [libj9thr29.so+0xb3a3])
start_thread+0xef (0x00007FD290A951CF [libpthread.so.0+0x81cf])
clone+0x43 (0x00007FD2904FCE73 [libc.so.6+0x39e73])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2024/01/11 22:46:58 - please wait.
JVMDUMP032I JVM requested System dump using '/usr/local/tomcatuwp/temp/core.20240111.224658.381973.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 636251 was not found.  Expected to find core file with name "/var/crash/core.636251"
JVMDUMP032I JVM requested Java dump using '/mnt/dumps/javacore.20240111.224658.381973.0002.txt' in response to an event
JVMDUMP010I Java dump written to /mnt/dumps/javacore.20240111.224658.381973.0002.txt
JVMDUMP032I JVM requested Snap dump using '/usr/local/tomcatuwp/temp/Snap.20240111.224658.381973.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /usr/local/tomcatuwp/temp/Snap.20240111.224658.381973.0003.trc
JVMDUMP032I JVM requested JIT dump using '/usr/local/tomcatuwp/temp/jitdump.20240111.224658.381973.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-001' thread 0x0000000000022400
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP010I JIT dump written to /usr/local/tomcatuwp/temp/jitdump.20240111.224658.381973.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

Please note that since IBM Semeru 21.0.1 has been officially released, I am going to switch to this version - I will let you know if the GPF issue takes place on it too (but once I'll have moved to 21.0.1 official, I may not be able to switch back easily to 21.0.2-m2 if you need more details to find the root cause & fix it).

Diagnostic files

See attached .dmp file (.core file was nowhere...)

jitdump.20240111.224658.381973.0004.dmp

See attached .trc file (ZIPped):

Snap.20240111.224658.381973.0003.trc.zip

vmState [0x53cff]: {J9VMSTATE_JIT} {coldBlockMarker}

hzongaro commented 10 months ago

Taking a quick look at the jitdump file, it looks like there's a problem with the child of a ResolveCHK node. The IL dump breaks off at that point in the IL for the original compilation and for the jitdump recompilation:

n51n      NULLCHK on n46n [#32]                                                               [0x7fd18a004a30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n        PassThrough                                                                       [0x7fd18a0049e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n          aload  cp<parm 2 Lsun/security/util/ConstraintsParameters;>[#423  Parm] [flags 0x40000107 0x0 ]  [0x7fd18a0048a0] bci=[-1,32,-] rc=5 vc=0 vn=- li=- udi=- nc=0
n57n      ZEROCHK [#54]                                                                       [0x7fd18a090190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n        iload  <temp slot 15>[#475  Auto] [flags 0x3 0x0 ]                                [0x7fd18a0900f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n231n     istore  <temp slot 12>[#472  Auto] [flags 0x3 0x0 ]                                 [0x7fd18a0937f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n        ==>iload
n49n      treetop                                                                             [0x7fd18a004990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n        acalli  java/lang/Object.toString()Ljava/lang/String;[#435  virtual Method -56] [flags 0x500 0x0 ]  [0x7fd18a004940] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n47n          aloadi  <vft-symbol>[#346  Shadow] [flags 0x18607 0x0 ]                         [0x7fd18a0048f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n            ==>aload
n46n          ==>aload
n230n     astore  <temp slot 11>[#471  Auto] [flags 0x7 0x0 ]                                 [0x7fd18a0937a0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n        ==>acalli
n227n     treetop                                                                             [0x7fd18a0936b0] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n46n        ==>aload
n244n     ifacmpeq --> block_27 BBStart at n241n ()                                           [0x7fd18a093c00] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n46n        ==>aload
n243n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7fd18a093bb0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n229n     BBEnd </block_3> =====                                                              [0x7fd18a093750] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n228n     BBStart <block_25>                                                                  [0x7fd18a093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n     ResolveCHK [#351]                                                                   [0x7fd18a093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>
hzongaro commented 10 months ago

This looks like a duplicate of issue #15730, which we were never been able to reproduce. @avermeer, if you were able to capture a core file, that would help greatly with the investigation.

avermeer commented 10 months ago

Unfortunately, I can't file the core file mentioned in catalina.out:

JVMDUMP032I JVM requested System dump using '/usr/local/tomcatuwp/temp/core.20240111.224658.381973.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 636251 was not found.  Expected to find core file with name "/var/crash/core.636251"

Would the javacore help (I think I can find it) ?

pshipton commented 10 months ago

Maybe this helps to find the core file. https://community.ibm.com/community/user/wasdevops/blogs/kevin-grigorenko1/2022/11/14/lessons-from-the-field-23-linux-core-dumps

hzongaro commented 10 months ago

@dsouzai, may I ask you to take a look at this? In both this case and #15730, the problem seems to arise in an AOT compile at the point of an invokeinterface on a method that comes from java/lang/Object - in the case of this issue, it appears to be a reference to toString():

From the jitdump, at offset 35 we find this:

       29, JBinvokevirtual         34
       32, JBaload2             
       33, JBinvokeinterface2      
       34, JBnop                
       35, JBinvokeinterface        104
       38, JBinvokevirtual         34

and from javap, we see a reference to sun/security/util/ConstraintsParameters.toString() at that point. That interface doesn't include its own definition of toString() - it just gets the one from java/lang/Object.

        29: invokevirtual #52                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
        32: aload_2
        33: invokeinterface #275,  1          // InterfaceMethod sun/security/util/ConstraintsParameters.toString:()Ljava/lang/String;
        38: invokevirtual #52                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;

Also in the jitdump, we see this entry for that interface method invocation:

ImproperInterfaceMethodFromCPRecord
    _method=0x00000000000419C8
    _beholder=0x00000000009A1F00
    className=sun/security/util/DisabledAlgorithmConstraints$Constraints
    _cpIndex=104
    kind=97
    id=32

I had tried the following trivial test, in an effort to reproduce issue #15730, but without success - I don't see an ImproperInterfaceMethodFromCPRecord in the trace log:

$ cat Trial.java
public class Trial {
    public static final void main(String[] args) {
        java.util.Collections.unmodifiableCollection(new java.util.ArrayList());
    }
}
$ ./jdk-21.0.1+12/bin/java -Xshareclasses:destroy
$ ./jdk-21.0.1+12/bin/java -Xaot:limit={*Collections.unmodifiableCollection*},{*Collections.unmodifiableCollection*}\(log=play.unmod.log,traceBC,traceILGen,traceFull\),count=0 Trial
dsouzai commented 10 months ago

@hzongaro with export TR_DontDisableSVMDuringStartup=1 I can see the improper interface validation record from the trivial test you provided:

  ============================================================
   3: JBinvokeinterface
ImproperInterfaceMethodFromCPRecord
        _method=0x000000000004C848
        _beholder=0x00000000000BD800
        className=java/util/Collections
        _cpIndex=65
        kind=97
        id=29

considering callNode 00007F53D119D7F0 for java serialization optimization
looking at receiver sig for callNode 00007F53D119D7F0
handling callNode 00007F53D119D7F0, receiver class name Ljava/util/Collection;)Ljava/util/Collection;O
separating null check on n5n from n7n
ClassByNameRecord
        _class=0x0000000000068C00
        className=java/util/Collection
        _classChain=0x00007F53AA7F82C0
        _beholder=0x00000000000BD800
        className=java/util/Collections
        kind=64
        id=30

   /--- trees inserted ------------------------
      n11n     (  0)  NULLCHK on n5n [#32]                                                                 [0x7f53d119d980] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n10n     (  1)    PassThrough                                                                        [0x7f53d119d930] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=1
      n5n      (  3)      aload  c<parm 0 Ljava/util/Collection;>[#421  Parm] [flags 0x40000107 0x0 ]      [0x7f53d119d7a0] bci=[-1,0,-] rc=3 vc=0 vn=- li=- udi=- nc=0
      n15n     (  0)  ZEROCHK [#54]                                                                        [0x7f53d119dac0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n14n     (  1)    instanceof  jitInstanceOf[#87  helper Method] [flags 0x400 0x0 ]                   [0x7f53d119da70] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=2
      n5n      (  3)      ==>aload
      n13n     (  1)      loadaddr  java/util/Collection[#424  Static] [flags 0x18307 0x0 ]                [0x7f53d119da20] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=0
      n7n      (  0)  treetop                                                                              [0x7f53d119d840] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n6n      (  2)    aloadi  <vft-symbol>[#346  Shadow] [flags 0x18607 0x0 ]                            [0x7f53d119d7f0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
      n5n      (  3)      ==>aload
   /--- stack after ------------------------
   @0 n9n      (  0)  aloadi  <javaLangClassFromClass>[#336  Shadow +48] [flags 0x607 0x0 ]                [0x7f53d119d8e0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n6n      (  2)    ==>aloadi

I didn't see a crash though; seems to succeed compiling.

dsouzai commented 10 months ago

I dug around the code but I'm not too familiar with how things are laid out so I can't offer any more insights; as far as I can tell, there's nothing wrong with the validation record generated. My guess is that there's probably some inconsistency associated with the notions of unresolved and AOT. Traditionally we always had this implicit coupling with AOT and unresolvedness, but that isn't true with the SVM enabled so it's possible that we've entered a code path where some structures think something is resolved but others think it's unresolved (for no good reason other than the implicit "is AOT").

avermeer commented 10 months ago

Just to mention that since I switched our pre-production from Semeru 21.0.2.0-m2 to Semeru 21.0.1 final, everything runs smoothly, not only I no more experience the GPF on our TomEE / webapps, but also our Apache Storm topologies are running very smoothly (which wasn't quite the case with 21.0.2.0-m2). All this on Linux x64 / RedHat 8.7 with same code both on web app & on Storm topologies side... I hope Semeru 21.0.2 Final will be as stable as 2.1.0.1 ...

pshipton commented 10 months ago

21.0.2 will be about the same as 21.0.1, OpenJ9 wise. The OpenJDK part will be updated to 21.0.2, which is expected tomorrow. The M2 builds had a problem, SecureRandom wasn't working, plus since M2 we partly fixed an assertion caused by object corruption which showed up in a class loading test. 21.0.2 will have a better fix for that. Presumably the problem that caused this GPF still exists, but it may be a rare occurrence.

avermeer commented 10 months ago

@pshipton thanks for insights on 21.0.2. Will there be a M3 release of 21.0.2 (with updated OpenJDK part) before final ?

pshipton commented 10 months ago

The typical process is to do Release Candidate builds once we have the new OpenJDK material, which aren't usually published since the GA should occur quickly afterwards.

pshipton commented 10 months ago

There will be OpenJ9 open source builds you can take, which are the same code but OpenJDK rather than Mozilla derived cacerts.

pshipton commented 9 months ago

Pls give the following a try. There is still a test failure to resolve on Linux s390 so there may be another change before the release.

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_x86-64_linux_Release/12/OpenJ9-JDK21-x86-64_linux-20240118-114213.tar.gz

avermeer commented 9 months ago

Thanks! but I already have all day tomorrow booked on other duties, so I'll keep that for maybe next Monday or Tuesday...

pshipton commented 9 months ago

Moving this forward since there is no fix yet and I don't see an outlook.

hzongaro commented 9 months ago

Given the problems we've had attempting to reproduce this problem, the lack of core files to help with the investigation, and the fact that the problem seems not to have occurred for the reporter recently, I'm going to move this to the 0.45 release.

avermeer commented 6 months ago

Hello,

For your information, we reproduce this issue with openj9-0.43.0.

More precisely:

java -version 
openjdk version "21.0.2" 2024-01-16 LTS 
IBM Semeru Runtime Open Edition 21.0.2.0 (build 21.0.2+13-LTS) 
Eclipse OpenJ9 VM 21.0.2.0 (build openj9-0.43.0, JRE 21 Linux amd64-64-Bit Compressed References 20240116_94 (JIT enabled, AOT enabled) 
OpenJ9 - 2c3d78b48 
OMR - ea8124dbc 
JCL - 78c4500a434 based on jdk-21.0.2+13) 

on RedHat Linux 8.7

With the following stack on STDOUT:

#0: /usr/local/jdk/lib/default/libj9jit29.so(+0x918885) [0x7f525dbf1885]
#1: /usr/local/jdk/lib/default/libj9jit29.so(+0x924c40) [0x7f525dbfdc40]
#2: /usr/local/jdk/lib/default/libj9jit29.so(+0x1606c9) [0x7f525d4396c9]
#3: /usr/local/jdk/lib/default/libj9prt29.so(+0x29948) [0x7f525fbbd948]
#4: /lib64/libpthread.so.0(+0x12cf0) [0x7f525f6d4cf0]
#5: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d22d0) [0x7f525d8ab2d0]
#6: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d39a3) [0x7f525d8ac9a3]
#7: /usr/local/jdk/lib/default/libj9jit29.so(+0x770a16) [0x7f525da49a16]
#8: /usr/local/jdk/lib/default/libj9jit29.so(+0x770d8a) [0x7f525da49d8a]
#9: /usr/local/jdk/lib/default/libj9jit29.so(+0x770fca) [0x7f525da49fca]
#10: /usr/local/jdk/lib/default/libj9jit29.so(+0x7999e5) [0x7f525da729e5]
#11: /usr/local/jdk/lib/default/libj9jit29.so(+0x79b6d3) [0x7f525da746d3]
#12: /usr/local/jdk/lib/default/libj9jit29.so(+0x5a63d3) [0x7f525d87f3d3]
#13: /usr/local/jdk/lib/default/libj9jit29.so(+0x58b14c) [0x7f525d86414c]
#14: /usr/local/jdk/lib/default/libj9jit29.so(+0x176a2f) [0x7f525d44fa2f]
#15: /usr/local/jdk/lib/default/libj9jit29.so(+0x177a61) [0x7f525d450a61]
#16: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7f525fbbe399]
#17: /usr/local/jdk/lib/default/libj9jit29.so(+0x1755a2) [0x7f525d44e5a2]
#18: /usr/local/jdk/lib/default/libj9jit29.so(+0x1758e8) [0x7f525d44e8e8]
#19: /usr/local/jdk/lib/default/libj9jit29.so(+0x17480b) [0x7f525d44d80b]
#20: /usr/local/jdk/lib/default/libj9jit29.so(+0x174b72) [0x7f525d44db72]
#21: /usr/local/jdk/lib/default/libj9jit29.so(+0x174c22) [0x7f525d44dc22]
#22: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7f525fbbe399]
#23: /usr/local/jdk/lib/default/libj9jit29.so(+0x174feb) [0x7f525d44dfeb]
#24: /usr/local/jdk/lib/default/libj9thr29.so(+0xb3a3) [0x7f525fb863a3]
#25: /lib64/libpthread.so.0(+0x81ca) [0x7f525f6ca1ca]
#26: /lib64/libc.so.6(clone+0x43) [0x7f525f132e73]
Unhandled exception
Type=Segmentation error vmState=0x00053cff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F525E59A7D0 Handler2=00007F525FBBD710 InaccessibleAddress=0000000000000008
RDI=00007F522BC19260 RSI=0000000000000000 RAX=0000000000000000 RBX=0000000000000000
RCX=00007F522BC19600 RDX=0000000000000000 R8=0000000000000000 R9=00007F525D4EB7D0
R10=000000000004B000 R11=00007F522AB82C20 R12=0000000000000000 R13=00007F522BC19220
R14=00000000CA4587E7 R15=00007F525DDB25DC
RIP=00007F525D8AB2D0 GS=0000 FS=0000 RSP=00007F522BC191E8
EFlags=0000000000010206 CS=0033 RBP=00007F522BC19220 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 00007f522bc1e780 (f: 734128000.000000, d: 6.916469e-310)
xmm1 00007f522ab22000 (f: 716316672.000000, d: 6.916468e-310)
xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm3 00007f522ab436d0 (f: 716453568.000000, d: 6.916468e-310)
xmm4 00007f522ab43820 (f: 716453888.000000, d: 6.916468e-310)
xmm5 000001bc000001bc (f: 444.000000, d: 9.421661e-312)
xmm6 00007f522ab43640 (f: 716453440.000000, d: 6.916468e-310)
xmm7 00007f522ab43580 (f: 716453248.000000, d: 6.916468e-310)
xmm8 00007f522ab43790 (f: 716453760.000000, d: 6.916468e-310)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 421f0918152e090b (f: 355338496.000000, d: 3.332414e+10)
xmm11 0000015200000151 (f: 337.000000, d: 7.172346e-312)
xmm12 0000013d00000140 (f: 320.000000, d: 6.726727e-312)
xmm13 000001380000013f (f: 319.000000, d: 6.620627e-312)
xmm14 0000000008001800 (f: 134223872.000000, d: 6.631540e-316)
xmm15 000001420000013b (f: 315.000000, d: 6.832826e-312)
Module=/usr/local/jdk/lib/default/libj9jit29.so
Module_base_address=00007F525D2D9000

Method_being_compiled=sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
Target=2_90_20240116_94 (Linux 4.18.0-425.19.2.el8_7.x86_64)
CPU=amd64 (2 logical CPUs) (0x1e5108000 RAM)
----------- Stack Backtrace -----------
_ZNK2TR13NodeChecklist8containsEPNS_4NodeE+0x0 (0x00007F525D8AB2D0 [libj9jit29.so+0x5d22d0])
_ZN2TR20PreorderNodeIterator11stepForwardEv+0x43 (0x00007F525D8AC9A3 [libj9jit29.so+0x5d39a3])
_ZN18TR_ColdBlockMarker11isBlockColdEPN2TR5BlockE+0xb6 (0x00007F525DA49A16 [libj9jit29.so+0x770a16])
_ZN18TR_ColdBlockMarker18identifyColdBlocksEv+0x9a (0x00007F525DA49D8A [libj9jit29.so+0x770d8a])
_ZN18TR_ColdBlockMarker7performEv+0x3a (0x00007F525DA49FCA [libj9jit29.so+0x770fca])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x855 (0x00007F525DA729E5 [libj9jit29.so+0x7999e5])
_ZN3OMR9Optimizer8optimizeEv+0x1b3 (0x00007F525DA746D3 [libj9jit29.so+0x79b6d3])
_ZN3OMR20ResolvedMethodSymbol5genILEP11TR_FrontEndPN2TR11CompilationEPNS3_20SymbolReferenceTableERNS3_12IlGenRequestE+0x343 (0x00007F525D87F3D3 [libj9jit29.so+0x5a63d3])
_ZN3OMR11Compilation7compileEv+0x25c (0x00007F525D86414C [libj9jit29.so+0x58b14c])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4bf (0x00007F525D44FA2F [libj9jit29.so+0x176a2f])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x381 (0x00007F525D450A61 [libj9jit29.so+0x177a61])
omrsig_protect+0x239 (0x00007F525FBBE399 [libj9prt29.so+0x2a399])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x372 (0x00007F525D44E5A2 [libj9jit29.so+0x1755a2])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x128 (0x00007F525D44E8E8 [libj9jit29.so+0x1758e8])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x35b (0x00007F525D44D80B [libj9jit29.so+0x17480b])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007F525D44DB72 [libj9jit29.so+0x174b72])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007F525D44DC22 [libj9jit29.so+0x174c22])
omrsig_protect+0x239 (0x00007F525FBBE399 [libj9prt29.so+0x2a399])
_Z21compilationThreadProcPv+0x17b (0x00007F525D44DFEB [libj9jit29.so+0x174feb])
thread_wrapper+0x163 (0x00007F525FB863A3 [libj9thr29.so+0xb3a3])
start_thread+0xea (0x00007F525F6CA1CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F525F132E73 [libc.so.6+0x39e73])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2024/05/13 16:46:21 - please wait.
JVMDUMP032I JVM requested System dump using '/home/poduser/core.20240513.164621.95158.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 111387 was not found.  Expected to find core file with name "/var/crash/core.111387"
JVMDUMP032I JVM requested Java dump using '/home/poduser/javacore.20240513.164621.95158.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/poduser/javacore.20240513.164621.95158.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/poduser/Snap.20240513.164621.95158.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/poduser/Snap.20240513.164621.95158.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/poduser/jitdump.20240513.164621.95158.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000018F00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP010I JIT dump written to /home/poduser/jitdump.20240513.164621.95158.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

and attached javacore.20240514.063244.8237.0002.txt, jitdump.20240514.063244.8237.0004.dmp.txt (rename because .dmp extension isn't allowed in this site) javacore.20240514.063244.8237.0002.txt jitdump.20240514.063244.8237.0004.dmp.txt Snap.20240514.063244.8237.0003.trc.txt , and Snap.20240514.063244.8237.0003.trc.txt (added .txt extension because .trc extension is not allowed for upload).

I don't know why the native core file couldn't be written.

I looks like we are reproducing this daily, would these new traces & files help finding out the issue's cause (and a potential by-pass) ?

Thanks, Alex

hzongaro commented 5 months ago

The symptom from jitdump.20240514.063244.8237.0004.dmp.txt looks to be the same as that described in the comment above — that is, the IL dumps for both the original failure and the recompilation break off at a ResolveCHK operation:

n228n     BBStart <block_25>                                                                  [0x7f952598b700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n     ResolveCHK [#351]                                                                   [0x7f952598be30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>

Did you have a chance to look at the blog that @pshipton mentioned in an earlier comment (particularly the section on core_pattern) to see whether that might help with gathering a core file?

If that doesn't help with producing a core file, may I ask you to try adding this option

-Xjit:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)'

to see whether we can get some additional information from a failing compilation? The drawback is that that will produce an additional log file even if the compile succeeds.

A potential bypass might be to use this option to prevent that method from being compiled at all, but we would really like to resolve this problem for you:

-Xjit:exclude={sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}
avermeer commented 5 months ago

Hello @hzongaro ,

I have checked the limits documented at https://community.ibm.com/community/user/wasdevops/blogs/kevin-grigorenko1/2022/11/14/lessons-from-the-field-23-linux-core-dumps but I'm still not getting code dumps written.

I have added the option which you suggested to get additional logs, I see it produced constraints.log.* files.

Here the largest of these log file attached: constraints.log.1.288317.13970.20240518.063250.288317.zip

Does it help narrowing down the issue?

Thanks, Alex

avermeer commented 5 months ago

Hello again,

Here are the files which I collected from yet another occurrence of this GPF crash, this time I have put in a single ZIP all relevant logs (which include the constraints.log.* files) so that's as comprehensive as possible: gpffiles.zip

On a side note, it would be nice if OpenJ9 could hint about why it cannot write core dump files : all limits which I checked have "nolimits", and file & directory write permissions are OK for all the directories which OpenJ9 tried to use to write the dump... couldn't OpenJ9 be a bit verbose about what prevent it from writing cores?

Thanks, Alex

avermeer commented 5 months ago

And here's yet another collection of logs with IBM Semeru 21.0.3 Milestone 2 (I have the same GPF crash as with 21.0.2)

/mnt/jdk21.0.3m2/bin/java -version 
openjdk version "21.0.3" 2024-04-16 LTS 
IBM Semeru Runtime Open Edition 21.0.3.0-m2 (build 21.0.3+7-LTS) 
Eclipse OpenJ9 VM 21.0.3.0-m2 (build v0.44.0-release-8b079b5c1, JRE 21 Linux amd64-64-Bit Compressed References 20240416_148 (JIT enabled, AOT enabled) 
OpenJ9 - 8b079b5c1 
OMR - 254af5a04 
JCL - 5b5adb35189 based on jdk-21.0.3+7) 

Here are the logs for the GPF: withJDK21.0.3m2.zip

pshipton commented 5 months ago

There usually is some explanation in the console output, do you have the console output from when the crash occurs?

pshipton commented 5 months ago

The console output from earlier was as follows.

JVMDUMP032I JVM requested System dump using '/home/poduser/core.20240513.164621.95158.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 111387 was not found.  Expected to find core file with name "/var/crash/core.111387"

Does the current user have permission to write to /var/crash? Is there enough space there? Any reason why the core couldn't be written to /home/poduser? What is the cat /proc/sys/kernel/core_pattern?

avermeer commented 5 months ago

Hello @pshipton,

I have added a 'chmod 777 /var/crashes', but strangely this time I got core dumped into another directory.

Problem is that core files are ~1,5GB large, and I guess I cannot upload such large files using "attach files" here...

I have tried this:

root@ip-172-31-39-56 temp]# gdb /mnt/jdk21.0.3m2/bin/java core.20240521.082437.1187691.0001.dmp 
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8 
Copyright (C) 2018 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law. 
Type "show copying" and "show warranty" for details. 
This GDB was configured as "x86_64-redhat-linux-gnu". 
Type "show configuration" for configuration details. 
For bug reporting instructions, please see: 
<http://www.gnu.org/software/gdb/bugs/>. 
Find the GDB manual and other documentation resources online at: 
 <http://www.gnu.org/software/gdb/documentation/>. 

For help, type "help". 
Type "apropos word" to search for commands related to "word"... 
Reading symbols from /mnt/jdk21.0.3m2/bin/java...Missing separate debuginfo for /mnt/jdk21.0.3m2/bin/java 
(no debugging symbols found)...done. 

warning: Can't open file (null) during file-backed mapping note processing 

warning: core file may not match specified executable file. 
[New LWP 1620700] 
Missing separate debuginfo for /usr/local/jdk/bin/../lib/libjli.so 
[Thread debugging using libthread_db enabled] 
Using host libthread_db library "/lib64/libthread_db.so.1". 
Missing separate debuginfo for /usr/local/jdk/lib/server/libjvm.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libjvm.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libomrsig.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9vm29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9prt29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9thr29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9hookable29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9zlib29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9dmp29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9trc29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9jit29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9gc29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9vrb29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9shr29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9jvmti29.so 
Missing separate debuginfo for /usr/local/jdk/lib/default/libjclse29.so 
Missing separate debuginfo for /usr/local/jdk/lib/libjimage.so 
Missing separate debuginfo for /usr/local/jdk/lib/libinstrument.so 
Missing separate debuginfo for /usr/local/jdk/lib/libjava.so 
Missing separate debuginfo for /usr/local/jdk/lib/libzip.so 
Missing separate debuginfo for /usr/local/jdk/lib/libnio.so 
Missing separate debuginfo for /usr/local/jdk/lib/libnet.so 
Missing separate debuginfo for /usr/local/jdk/lib/libmanagement.so 
Missing separate debuginfo for /usr/local/jdk/lib/libjncrypto.so 
Missing separate debuginfo for /usr/local/jdk/lib/libextnet.so 
Core was generated by `/usr/local/jdk/bin/java -Djava.util.logging.config.file=/usr/local/tomeeUsage/c'. 
Program terminated with signal SIGSEGV, Segmentation fault. 
#0 0x00007f890a0eda35 in pthread_kill () from /lib64/libpthread.so.0 
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-236.el8_9.12.x86_64 libblkid-2.32.1-43.el8.x86_64 libgcc-8.5.0-16.el8_7.x86_64 libmount-2.32.1-43.el8.x86_64 libselinux-2.9-6.el
.x86_64 libstdc++-8.5.0-16.el8_7.x86_64 libuuid-2.32.1-43.el8.x86_64 openssl-libs-1.1.1k-9.el8_7.x86_64 pcre2-10.32-3.el8_6.x86_64 systemd-libs-239-78.el8.x86_64 zlib-1.2.11-21.el8_7.x86_64 

(gdb) bt 
#0 0x00007f890a0eda35 in pthread_kill () from /lib64/libpthread.so.0 
#1 0x00007f890a5d8992 in omrdump_create () from /usr/local/jdk/lib/default/libj9prt29.so 
#2 0x00007f8908c26b2f in doSystemDump () from /usr/local/jdk/lib/default/libj9dmp29.so 
#3 0x00007f8908c227c5 in protectedDumpFunction () from /usr/local/jdk/lib/default/libj9dmp29.so 
#4 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so 
#5 0x00007f8908c25fb2 in runDumpFunction () from /usr/local/jdk/lib/default/libj9dmp29.so 
#6 0x00007f8908c2613d in runDumpAgent () from /usr/local/jdk/lib/default/libj9dmp29.so 
#7 0x00007f8908c3da75 in triggerDumpAgents () from /usr/local/jdk/lib/default/libj9dmp29.so 
#8 0x00007f8908fb63d2 in generateDiagnosticFiles () from /usr/local/jdk/lib/default/libj9vm29.so 
#9 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so 
#10 0x00007f8908fb655a in vmSignalHandler () from /usr/local/jdk/lib/default/libj9vm29.so 
#11 0x00007f890a5d9948 in mainSynchSignalHandler () from /usr/local/jdk/lib/default/libj9prt29.so 
#12 <signal handler called> 
#13 0x00007f88f69473b0 in TR::NodeChecklist::contains(TR::Node*) const () from /usr/local/jdk/lib/default/libj9jit29.so 
#14 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward() () from /usr/local/jdk/lib/default/libj9jit29.so 
#15 0x00007f88f6ae5af6 in TR_ColdBlockMarker::isBlockCold(TR::Block*) () from /usr/local/jdk/lib/default/libj9jit29.so 
#16 0x00007f88f6ae5e6a in TR_ColdBlockMarker::identifyColdBlocks() () from /usr/local/jdk/lib/default/libj9jit29.so 
#17 0x00007f88f6ae60aa in TR_ColdBlockMarker::perform() () from /usr/local/jdk/lib/default/libj9jit29.so 
#18 0x00007f88f6b0eac5 in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) [clone .localalias] () from /usr/local/jdk/lib/default/libj9jit29.so 
#19 0x00007f88f6b107b3 in OMR::Optimizer::optimize() () from /usr/local/jdk/lib/default/libj9jit29.so 
#20 0x00007f88f691b4b3 in OMR::ResolvedMethodSymbol::genIL(TR_FrontEnd*, TR::Compilation*, TR::SymbolReferenceTable*, TR::IlGenRequest&) () from /usr/local/jdk/lib/default/libj9jit29.so 
#21 0x00007f88f690022c in OMR::Compilation::compile() () from /usr/local/jdk/lib/default/libj9jit29.so 
#22 0x00007f88f64eba2f in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&) () 
 from /usr/local/jdk/lib/default/libj9jit29.so 
#23 0x00007f88f64eca61 in TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) () from /usr/local/jdk/lib/default/libj9jit29.so 
#24 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so 
#25 0x00007f88f64ea5a2 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR_MethodToBeCompiled*, J9::J9SegmentProvider&) () from /usr/local/jdk/lib/default/libj9jit29.so 
#26 0x00007f88f64ea8e8 in TR::CompilationInfoPerThread::processEntry(TR_MethodToBeCompiled&, J9::J9SegmentProvider&) () from /usr/local/jdk/lib/default/libj9jit29.so 
#27 0x00007f88f64e980b in TR::CompilationInfoPerThread::processEntries() () from /usr/local/jdk/lib/default/libj9jit29.so 
#28 0x00007f88f64e9b72 in TR::CompilationInfoPerThread::run() () from /usr/local/jdk/lib/default/libj9jit29.so 
#29 0x00007f88f64e9c22 in protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*) () from /usr/local/jdk/lib/default/libj9jit29.so 
#30 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so 
#31 0x00007f88f64e9feb in compilationThreadProc(void*) () from /usr/local/jdk/lib/default/libj9jit29.so 
#32 0x00007f890a5a23a3 in thread_wrapper () from /usr/local/jdk/lib/default/libj9thr29.so 
#33 0x00007f890a0e61ca in start_thread () from /lib64/libpthread.so.0 
#34 0x00007f8909b4ee73 in clone () from /lib64/libc.so.6 

do you see further gdb commands which I should run against this core file so you can get more insight of where the crash comes from ?

Were the constraints.log.* file useful to anything ?

Thanks, Alexandre

pshipton commented 5 months ago

We can see what Henry needs. Also you could use a file sharing service and then share the link either here or privately.

pshipton commented 5 months ago

The core file should compress quite a bit.

avermeer commented 5 months ago

@pshipton using 'gzip -9' I was able to reduce core file size down to 214 MB, but I don't know where to upload it for sharing it.

But when I asked if constraints.log.* did help you mentioned "We can see what Henry needs" => do it mean that you have enough data to understand the GPF's cause without the core ?

pshipton commented 5 months ago

Can you transfer the core file to me via Slack? Otherwise googling "file sharing" suggests a number of sites that can share files.

I mean we need input from Henry, I don't know what is required, but having the core file is usually beneficial.

avermeer commented 5 months ago

Okay let's see if @hzongaro has enough with contraints.log.* files and otherwise I'll see how to use one of the file sharing.

When you mention Slack for me sending you the gziped core file, is that for example OpenJ9's slack ?

pshipton commented 5 months ago

When you mention Slack for me sending you the gziped core file, is that for example OpenJ9's slack ?

Correct, you should be able to open a private chat with me and drop the file in.

pshipton commented 5 months ago

I've received the core and jpackcore files and shared them on Box with Henry.

hzongaro commented 5 months ago

@avermeer, thank you for gathering those log and core files! I'll start looking at them to determine whether there's anything else we'll need to resolve the problem.

hzongaro commented 5 months ago

@a7ehuo, may I ask you to look at this problem? I've shared with you a link to the Box folder that Peter mentioned above.

a7ehuo commented 5 months ago

I looked at the core.20240521.082437.1187691.0001.dmp with jdk-21.0.3+7 and its debug image. Somehow I don't see the full backtrace.

(gdb) bt
#0  0x00007f890a0eda35 in ?? ()
#1  0x00007f890a689480 in newPath () from /usr/local/jdk/lib/default/libjvm.so
#2  0x00007f88d1fb4869 in ?? ()
#3  0x00007f88d1fb4850 in ?? ()
#4  0x00007f890a5d8992 in omrdump_create (portLibrary=0x7f890a689480 <j9portLibrary>, 
    filename=0xb <error: Cannot access memory at address 0xb>, dumpType=<optimized out>, userData=<optimized out>)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/port/unix/omrosdump.c:182
#5  0x00007f8908c26b2f in enableDumpOnOutOfMemoryError (agentOpts=0x18badc, agentNum=0x18badc)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/openj9/runtime/rasdump/dmpmap.c:541
#6  0x00007f88b0ac3be8 in ?? ()
#7  0x00007f88b288ad78 in ?? ()
#8  0x00007f88b4000000 in ?? ()
#9  0x80102001eca6239b in ?? ()
#10 0x000000000000002b in ?? ()
#11 0x0000000000000000 in ?? ()

@avermeer, thank you for capturing all the logs! One thing I noticed is that constraints.log.* captured the JIT compilation instead of AOT compilation. Could you add the following options to the java command next time to capture both logs along with the core file? The concerned compilation so far is AOT but it's better to capture both just in case.

-Xaot:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)' -Xjit:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)' 
avermeer commented 5 months ago

Hello @a7ehuo , please find attached the resulting constraints.log.* files: constraintslog.zip

a7ehuo commented 5 months ago

@hzongaro found the core was actually generated from 21.0.1+12. I thought it was generated from jdk-21.0.3+7 based on the previous comment. I'm able to get the backtrace now. Along with the AOT compiled log, I narrowed down the issue to TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof.

JAVA VM NAME    - Eclipse OpenJ9 VM
JAVA VM VERSION - openj9-0.42.0
JAVA SERVICE LEVEL INFO
JRE 21 Linux amd64-64 (build 21.0.1+12-LTS)
JAVA PRODUCT INFO
IBM Semeru Runtime Open Edition
JAVA VERSION INFO
JRE 21 Linux amd64-64-Bit Compressed References 20231017_77 (JIT enabled, AOT enabled)
OpenJ9   - 874af12ee
OMR      - 11700e64f
JCL      - cb91cd755e3 based on jdk-21.0.1+12

The crash happened in TR_ColdBlockMarker::isBlockCold while it iterates through all the trees in the block [1]. When it processes ResolveCHK node [2], its first child is NULL [3]. The block block_25 that contains the ResolveCHK is created by TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof [4]. The first child of the ResolveCHK is supposed to be a class object node.

https://github.com/eclipse-openj9/openj9/blob/ef127697cd5dc6201b6d6ae39bb3f0d0587ceb1c/runtime/compiler/ilgen/IlGenerator.cpp#L2700-L2705

The newInstanceofTree should be duplicated from n57n ZEROCHK [5], based on the transformation comments, resultStoreNode->getFirstChild()->getSecondChild() should give the class object node. Not sure at the moment why classNode is NULL here. The investigation is ongoing.

   +-<block_$nonnull>-----------------------+               v
   |  ResolveCHK                            |   +-<block_$null>----------+
   |    loadaddr $klass [unresolved Static] |   |  istore $tmp_result    |
   |  istore $tmp_result                    |   |    iconst 0            |
   |    instanceof                          |   |  goto --> block_$merge |
   |      aload $tmp_obj                    |   +------------------------+
   |      =>loadaddr                        |               |
   +----------------------------------------+ 

[1]

(gdb) bt 
#12 <signal handler called>
#13 OMR::Node::getGlobalIndex (this=<optimized out>)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/Checklist.cpp:52
#14 TR::NodeChecklist::contains (this=0x7f88d1fb7260, n=0x0)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/Checklist.cp--Type <RET> for more, q to quit, c to continue without paging--
p:52
#15 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward (this=0x7f88d1fb7220)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.cpp:181
#16 0x00007f88f6ae5af6 in TR::PreorderNodeIterator::operator++ (this=0x7f88d1fb7220)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.hpp:175
#17 TR_ColdBlockMarker::isBlockCold (this=0x7f8862a0d220, block=0x7f8862a6f200)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8600
#18 0x00007f88f6ae5e6a in TR_ColdBlockMarker::identifyColdBlocks (this=0x7f8862a0d220)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8558
#19 0x00007f88f6ae60aa in TR_ColdBlockMarker::perform (this=0x7f8862a0d220)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8517
#20 0x00007f88f6b0eac5 in OMR::Optimizer::performOptimization (this=0x7f8862a6fa10, optimization=<optimized out>, 
    firstOptIndex=<optimized out>, lastOptIndex=<optimized out>, doTiming=<optimized out>)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/OMROptimizer.cpp:2057
#21 0x00007f88f6b107b3 in OMR::Optimizer::optimize (this=this@entry=0x7f8862a6fa10)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/OMROptimizer.cpp:1121
#22 0x00007f88f691b4b3 in OMR::ResolvedMethodSymbol::genIL (this=0x7f8862a028e0, fe=0x7f88c4008010, 
    comp=comp@entry=0x7f88629fd000, symRefTab=0x7f88629fed70, customRequest=...)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/il/OMRResolvedMethodSymbol.cpp:1270
#23 0x00007f88f690022c in OMR::Compilation::compile (this=this@entry=0x7f88629fd000)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/compile/OMRCompilation.hpp:324

[2]

n228n     BBStart <block_25> (freq 119)                                                       [0x7fd18b093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n     ResolveCHK [#351]                                                                   [0x7fd18b093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</ilOfCrashedThread>

[3]

(gdb) fr 15
#15 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward (this=0x7f88d1fb7220)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.cpp:181
181       if (!alreadyBeenPushed(child))
(gdb) p child
$11 = (TR::Node *) 0x0

(gdb) fr 17
#17 TR_ColdBlockMarker::isBlockCold (this=0x7f8862a0d220, block=0x7f8862a6f200)
    at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8600
8600          // some unresolved symbols in it, then it must be a cold block
(gdb) list -1
8594       //
8595       for (TR::PreorderNodeIterator iter(block->getFirstRealTreeTop(), comp()); iter != block->getExit(); ++iter)
8596          {
8597          TR::Node *node = iter.currentNode();
8598    
8599          // If the block throws an exception, and exceptions are rare in this method, or if the block has
8600          // some unresolved symbols in it, then it must be a cold block
8601          //
8602          // NOTE: return is not considered a cold block: in a loopless methods, a return postdominates
8603          // everything.

(gdb) p block->_pEntry->_pNode->_opCode->_opCode
$1 = TR::BBStart
(gdb) p block->_pEntry->_pNext->_pNode->_opCode->_opCode
$2 = TR::ResolveCHK
(gdb) p block->_pEntry->_pNext->_pNode
$3 = (TR::Node *) 0x7f8862a7e9f0
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_numChildren
$4 = 1
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->__flags
There is no member or method named __flags.
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_flags
$5 = {_flags = 0}
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_opCode->_opCode
$6 = TR::ResolveCHK
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_unionBase._children[0]
$7 = (TR::Node *) 0x0
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_unionBase._children
$8 = {0x0, 0x0}

[4]

expanding unresolved class instanceof n55n in block_3
    create storeNode 00007FC24C6797A0 of tempSymRef #471 (possibly for node uncommoning during opcodeExpansion)
    create storeNode 00007FC24C6797F0 of tempSymRef #472 (possibly for node uncommoning during opcodeExpansion)
    create storeNode 00007FC24C679840 of tempSymRef #473 (possibly for node uncommoning during opcodeExpansion)
    create storeNode 00007FC24C679890 of tempSymRef #474 (possibly for node uncommoning during opcodeExpansion)
    result in temp #475
    block_25: resolve, instanceof
    block_27: false
    block_26: tail of original block
genIL() returned 1
<trees
    title="Pre IlGenOpt Trees"
    method="sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V"
    hotness="warm">

[5]

   /--- trees inserted ------------------------
      n51n     (  0)  NULLCHK on n46n [#32]                                                                [0x7fc24c5eaa30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n50n     (  1)    PassThrough                                                                        [0x7fc24c5ea9e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
      n46n     (  4)      aload  <parm 2 Lsun/security/util/ConstraintsParameters;>[#423  Parm] [flags 0x40000107 0x0 ]  [0x7fc24c5ea8a0] bci=[-1,32,-] rc=4 vc=0 vn=- li=- udi=- nc=0
      n57n     (  0)  ZEROCHK [#54]                                                                        [0x7fc24c676190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n55n     (  2)    instanceof  jitInstanceOf[#87  helper Method] [flags 0x400 0x0 ]                   [0x7fc24c6760f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
      n46n     (  4)      ==>aload
      n53n     (  1)      loadaddr  unknown class object[#436  unresolved Static] [flags 0x18307 0x0 ]     [0x7fc24c676050] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0
avermeer commented 5 months ago

Yes I have reproduced this issue with 21.0.1, 21.0.2 and 21.0.3 milestone ; sorry if I confused you between my various runs/outputs

avermeer commented 5 months ago

Hello, is there a tentative fix which I could test?

a7ehuo commented 5 months ago

Sorry, not yet. Unfortunately, this week I haven't been able to get back to this issue. Hopefully I'll be able to continue the investigation later this week. I'll keep you posted once I have something to try out. Thank you for following up!

a7ehuo commented 5 months ago

I think I found the problem. The following is the generated trees for invokeinterface on java/lang/Object.toString

      n51n     (  0)  NULLCHK on n46n [#32]                                                                [0x7fc24c5eaa30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n50n     (  1)    PassThrough                                                                        [0x7fc24c5ea9e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
      n46n     (  4)      aload  <parm 2 Lsun/security/util/ConstraintsParameters;>[#423  Parm] [flags 0x40000107 0x0 ]  [0x7fc24c5ea8a0] bci=[-1,32,-] rc=4 vc=0 vn=- li=- udi=- nc=0
      n57n     (  0)  ZEROCHK [#54]                                                                        [0x7fc24c676190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n55n     (  2)    instanceof  jitInstanceOf[#87  helper Method] [flags 0x400 0x0 ]                   [0x7fc24c6760f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
      n46n     (  4)      ==>aload
      n53n     (  1)      loadaddr  unknown class object[#436  unresolved Static] [flags 0x18307 0x0 ]     [0x7fc24c676050] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0
      n49n     (  0)  treetop                                                                              [0x7fc24c5ea990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n48n     (  1)    acalli  java/lang/Object.toString()Ljava/lang/String;[#435  virtual Method -56] [flags 0x500 0x0 ]  [0x7fc24c5ea940] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=2
      n47n     (  1)      aloadi  <vft-symbol>[#346  Shadow] [flags 0x18607 0x0 ]                          [0x7fc24c5ea8f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
      n46n     (  4)        ==>aload
      n46n     (  4)      ==>aload
      n56n     (  0)  treetop                                                                              [0x7fc24c676140] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
      n55n     (  2)    ==>instanceof

Because the class object for instanceof is unresolved, genInstanceof anchors instanceof (n55n) under a treetop node (n56n) https://github.com/eclipse-openj9/openj9/blob/ef127697cd5dc6201b6d6ae39bb3f0d0587ceb1c/runtime/compiler/ilgen/Walker.cpp#L2341-L2345.

At the end of genInvokeInterface, the treetop (n56n) that anchors the instanceof (n55n) appears after the call tree. The anchored instanceof(n55n) commons the previous instanceof under the ZEROCHK (n57n). This causes a problem for TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof which happens later.

In expandUnresolvedClassInstanceof, it needs to split the block at the treetop (n56n) that anchors the instanceof (n55n). Because the instanceof under the treetop (n56n) is not the first time the node appears, in order to fix commoning, the block split will have to create a temp after ZEROCHK to store the value of instanceof. And instanceof (n55n) would be replaced with an iload from the temp. See an example below.

n15n      ZEROCHK [#54]
n13n        instanceof  jitInstanceOf
n5n           ==>aload
n12n          loadaddr  SomeInterface
n74n      istore  <temp slot 3>[#434  Auto] //<-- fix commoning
n13n        ==>instanceof
n8n       treetop
n7n         acalli  java/lang/Object.toString()Ljava/lang/String;
n6n           aloadi  <vft-symbol>[#346  Shadow]
n5n             ==>aload
n5n           ==>aload
n73n      astore  <temp slot 2>
n7n         ==>acalli
n70n      treetop
n5n         ==>aload
n72n      BBEnd </block_2> =====

n71n      BBStart <block_6>
n14n      treetop
n76n        iload  <temp slot 3>[#434  Auto] // load from the temp on the value of instanceof

expandUnresolvedClassInstanceof doesn't expect this and still thinks it is dealing with the original instanceof node. So when it tries to get the class object node from the instanceof for the ResolveCHK, it tries to get the second child from the iload that has replaced the original instanceof. The iload doesn't have a second child, therefore, it crashes right after ResolveCHK.

a7ehuo commented 5 months ago

@avermeer How to send you a potential fix to test? If you are on OpenJ9 slack channel, maybe I could send a debug build on Slack?

avermeer commented 5 months ago

Hi @a7ehuo yes I am on OpenJ9 slack channel, with "ave" identifier - I already shared the cores with @pshipton using this channel. To clarify, I need Linux x64 (not aarch64, not Windows) so as to efficiently test.

pshipton commented 5 months ago

If you want me to build it using the openj9 jenkins, I can do that. Then the binary will be available for download from artifactory to anyone. Just provide the branches to build.

a7ehuo commented 5 months ago

If you want me to build it using the openj9 jenkins, I can do that.

Here is the branch info. Thank you very much @pshipton!

OPENJ9_REPO: git@github.com:a7ehuo/openj9.git OPENJ9_BRANCH: fix-ilgen-instanceof-2

pshipton commented 5 months ago

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/530/

a7ehuo commented 5 months ago

Thank you @pshipton!

@avermeer Could you give the debug build a try? Thank you!

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_x86-64_linux_Personal/159/OpenJ9-JDK21-x86-64_linux-20240530-202345.tar.gz

avermeer commented 5 months ago

Hello @a7ehuo and @pshipton , thank you very much for this debug build. I applied it to my pre-preproduction, after 6 hours of observations, I have no GPF crash on the webapp that used to have crashes within less than 5 hours of execution, so this sounds very promising.

I will keep this setup until at least Monday so as to give definitive answer.

On a side note, I have another GPF on another webapp, that's taking place during another method compilation:

(Method_being_compiled=javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V)

so I first need to review existing OpenJ9 incidents so as to check if that's not an already know issue with Semeru 21.0.3 ; and if not, then I will open a dedicated incident (as it's not impacting the same method as current incident)

a7ehuo commented 5 months ago

I have another GPF on another webapp, that's taking place during another method compilation:> > (Method_being_compiled=javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V)

@avermeer just double checking, for this cash, is this running the debug build from yesterday or the release build?

If it's not running the debug build, is there a jitdump.*.dmp log? Maybe I can take a quick look and see if it is the same issue.

If it's not related to this issue, adding the following option to capture more logs may help further investigation when you open a new issue. Thank you very much!

-Xaot:'{javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V}(log=aotlog,traceILGen,traceBC,traceFull)' -Xjit:'{javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V}(log=jitlog.log,traceILGen,traceBC,traceFull)' 
avermeer commented 5 months ago

@a7ehuo : the new crash on AlgorithmParameterSpec took place with released version of IBM Semeru Community Edition 21.0.3 for Linux x64, on a web application which differs from the one of current ticket (this setup was not based on the recent debug build).

See attached jitdump.*.dmp log file for this new crash so you may check if that's caused by same issue as the one of current incident: jitdump.20240530.162026.7896.0004.dmp

I have taken note of the options you propose to get more traces, but since this crash took place on a web app having 5 days and a half of uptime, I'm afraid it'll take quite some time before I guess results...

a7ehuo commented 5 months ago

@avermeer I took a look at jitdump.20240530.162026.7896.0004.dmp. I believe it is the same issue: It crashed right in processing the first child of ResolveCHK and the previous block block_26 shows same IL tree transformation done by TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof involving a ZEROCHK.

n190n     BBStart <block_26>                                                                  [0x7f59c4d6fb20] bci=[-1,215,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n237n     NULLCHK on n231n [#32]                                                              [0x7f59c4d709d0] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n236n       PassThrough                                                                       [0x7f59c4d70980] bci=[-1,218,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n231n         aload  key<parm 1 Ljava/security/Key;>[#423  Parm] [flags 0x40000107 0x0 ]      [0x7f59c4d707f0] bci=[-1,215,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n244n     ZEROCHK [#54]                                                                       [0x7f59c4d70c00] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n242n       iload  <temp slot 15>[#477  Auto] [flags 0x3 0x0 ]                                [0x7f59c4d70b60] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n364n     istore  <temp slot 14>[#476  Auto] [flags 0x3 0x0 ]                                 [0x7f59c4d73180] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n242n       ==>iload
n233n     treetop                                                                             [0x7f59c4d70890] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n232n       aloadi  <vft-symbol>[#346  Shadow] [flags 0x18607 0x0 ]                           [0x7f59c4d70840] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n231n         ==>aload
n241n     treetop                                                                             [0x7f59c4d70b10] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n235n       aloadi  <javaLangClassFromClass>[#336  Shadow +48] [flags 0x607 0x0 ]             [0x7f59c4d70930] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n232n         ==>aloadi
n363n     astore  <temp slot 13>[#475  Auto] [flags 0x7 0x0 ]                                 [0x7f59c4d73130] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n235n       ==>aloadi
n360n     treetop                                                                             [0x7f59c4d73040] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n231n       ==>aload
n373n     ifacmpeq --> block_45 BBStart at n370n ()                                           [0x7f59c4d73450] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n231n       ==>aload
n372n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7f59c4d73400] bci=[-1,218,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n362n     BBEnd </block_26> =====                                                             [0x7f59c4d730e0] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n361n     BBStart <block_43>                                                                  [0x7f59c4d73090] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n380n     ResolveCHK [#351]                                                                   [0x7f59c4d73680] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>

Just for comparison, below is jitdump.20240111.224658.381973.0004.dmp from this issue

n51n      NULLCHK on n46n [#32]                                                               [0x7fd18a004a30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n        PassThrough                                                                       [0x7fd18a0049e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n          aload  cp<parm 2 Lsun/security/util/ConstraintsParameters;>[#423  Parm] [flags 0x40000107 0x0 ]  [0x7fd18a0048a0] bci=[-1,32,-] rc=5 vc=0 vn=- li=- udi=- nc=0
n57n      ZEROCHK [#54]                                                                       [0x7fd18a090190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n        iload  <temp slot 15>[#475  Auto] [flags 0x3 0x0 ]                                [0x7fd18a0900f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n231n     istore  <temp slot 12>[#472  Auto] [flags 0x3 0x0 ]                                 [0x7fd18a0937f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n        ==>iload
n49n      treetop                                                                             [0x7fd18a004990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n        acalli  java/lang/Object.toString()Ljava/lang/String;[#435  virtual Method -56] [flags 0x500 0x0 ]  [0x7fd18a004940] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n47n          aloadi  <vft-symbol>[#346  Shadow] [flags 0x18607 0x0 ]                         [0x7fd18a0048f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n            ==>aload
n46n          ==>aload
n230n     astore  <temp slot 11>[#471  Auto] [flags 0x7 0x0 ]                                 [0x7fd18a0937a0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n        ==>acalli
n227n     treetop                                                                             [0x7fd18a0936b0] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n46n        ==>aload
n244n     ifacmpeq --> block_27 BBStart at n241n ()                                           [0x7fd18a093c00] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n46n        ==>aload
n243n       aconst NULL (X==0 X>=0 X<=0 )                                                     [0x7fd18a093bb0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n229n     BBEnd </block_3> =====                                                              [0x7fd18a093750] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0

n228n     BBStart <block_25>                                                                  [0x7fd18a093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n     ResolveCHK [#351]                                                                   [0x7fd18a093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>