Closed avermeer closed 5 months ago
Taking a quick look at the jitdump file, it looks like there's a problem with the child of a ResolveCHK
node. The IL dump breaks off at that point in the IL for the original compilation and for the jitdump recompilation:
n51n NULLCHK on n46n [#32] [0x7fd18a004a30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n PassThrough [0x7fd18a0049e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n aload cp<parm 2 Lsun/security/util/ConstraintsParameters;>[#423 Parm] [flags 0x40000107 0x0 ] [0x7fd18a0048a0] bci=[-1,32,-] rc=5 vc=0 vn=- li=- udi=- nc=0
n57n ZEROCHK [#54] [0x7fd18a090190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n iload <temp slot 15>[#475 Auto] [flags 0x3 0x0 ] [0x7fd18a0900f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n231n istore <temp slot 12>[#472 Auto] [flags 0x3 0x0 ] [0x7fd18a0937f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n ==>iload
n49n treetop [0x7fd18a004990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n acalli java/lang/Object.toString()Ljava/lang/String;[#435 virtual Method -56] [flags 0x500 0x0 ] [0x7fd18a004940] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n47n aloadi <vft-symbol>[#346 Shadow] [flags 0x18607 0x0 ] [0x7fd18a0048f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n ==>aload
n46n ==>aload
n230n astore <temp slot 11>[#471 Auto] [flags 0x7 0x0 ] [0x7fd18a0937a0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n ==>acalli
n227n treetop [0x7fd18a0936b0] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n46n ==>aload
n244n ifacmpeq --> block_27 BBStart at n241n () [0x7fd18a093c00] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n46n ==>aload
n243n aconst NULL (X==0 X>=0 X<=0 ) [0x7fd18a093bb0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n229n BBEnd </block_3> ===== [0x7fd18a093750] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n228n BBStart <block_25> [0x7fd18a093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n ResolveCHK [#351] [0x7fd18a093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>
This looks like a duplicate of issue #15730, which we were never been able to reproduce. @avermeer, if you were able to capture a core file, that would help greatly with the investigation.
Unfortunately, I can't file the core file mentioned in catalina.out:
JVMDUMP032I JVM requested System dump using '/usr/local/tomcatuwp/temp/core.20240111.224658.381973.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 636251 was not found. Expected to find core file with name "/var/crash/core.636251"
Would the javacore help (I think I can find it) ?
Maybe this helps to find the core file. https://community.ibm.com/community/user/wasdevops/blogs/kevin-grigorenko1/2022/11/14/lessons-from-the-field-23-linux-core-dumps
@dsouzai, may I ask you to take a look at this? In both this case and #15730, the problem seems to arise in an AOT compile at the point of an invokeinterface
on a method that comes from java/lang/Object
- in the case of this issue, it appears to be a reference to toString()
:
From the jitdump, at offset 35 we find this:
29, JBinvokevirtual 34
32, JBaload2
33, JBinvokeinterface2
34, JBnop
35, JBinvokeinterface 104
38, JBinvokevirtual 34
and from javap, we see a reference to sun/security/util/ConstraintsParameters.toString()
at that point. That interface doesn't include its own definition of toString()
- it just gets the one from java/lang/Object
.
29: invokevirtual #52 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
32: aload_2
33: invokeinterface #275, 1 // InterfaceMethod sun/security/util/ConstraintsParameters.toString:()Ljava/lang/String;
38: invokevirtual #52 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
Also in the jitdump, we see this entry for that interface method invocation:
ImproperInterfaceMethodFromCPRecord
_method=0x00000000000419C8
_beholder=0x00000000009A1F00
className=sun/security/util/DisabledAlgorithmConstraints$Constraints
_cpIndex=104
kind=97
id=32
I had tried the following trivial test, in an effort to reproduce issue #15730, but without success - I don't see an ImproperInterfaceMethodFromCPRecord
in the trace log:
$ cat Trial.java
public class Trial {
public static final void main(String[] args) {
java.util.Collections.unmodifiableCollection(new java.util.ArrayList());
}
}
$ ./jdk-21.0.1+12/bin/java -Xshareclasses:destroy
$ ./jdk-21.0.1+12/bin/java -Xaot:limit={*Collections.unmodifiableCollection*},{*Collections.unmodifiableCollection*}\(log=play.unmod.log,traceBC,traceILGen,traceFull\),count=0 Trial
@hzongaro with export TR_DontDisableSVMDuringStartup=1
I can see the improper interface validation record from the trivial test you provided:
============================================================
3: JBinvokeinterface
ImproperInterfaceMethodFromCPRecord
_method=0x000000000004C848
_beholder=0x00000000000BD800
className=java/util/Collections
_cpIndex=65
kind=97
id=29
considering callNode 00007F53D119D7F0 for java serialization optimization
looking at receiver sig for callNode 00007F53D119D7F0
handling callNode 00007F53D119D7F0, receiver class name Ljava/util/Collection;)Ljava/util/Collection;O
separating null check on n5n from n7n
ClassByNameRecord
_class=0x0000000000068C00
className=java/util/Collection
_classChain=0x00007F53AA7F82C0
_beholder=0x00000000000BD800
className=java/util/Collections
kind=64
id=30
/--- trees inserted ------------------------
n11n ( 0) NULLCHK on n5n [#32] [0x7f53d119d980] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n10n ( 1) PassThrough [0x7f53d119d930] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n5n ( 3) aload c<parm 0 Ljava/util/Collection;>[#421 Parm] [flags 0x40000107 0x0 ] [0x7f53d119d7a0] bci=[-1,0,-] rc=3 vc=0 vn=- li=- udi=- nc=0
n15n ( 0) ZEROCHK [#54] [0x7f53d119dac0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n14n ( 1) instanceof jitInstanceOf[#87 helper Method] [flags 0x400 0x0 ] [0x7f53d119da70] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=2
n5n ( 3) ==>aload
n13n ( 1) loadaddr java/util/Collection[#424 Static] [flags 0x18307 0x0 ] [0x7f53d119da20] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=0
n7n ( 0) treetop [0x7f53d119d840] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n6n ( 2) aloadi <vft-symbol>[#346 Shadow] [flags 0x18607 0x0 ] [0x7f53d119d7f0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n5n ( 3) ==>aload
/--- stack after ------------------------
@0 n9n ( 0) aloadi <javaLangClassFromClass>[#336 Shadow +48] [flags 0x607 0x0 ] [0x7f53d119d8e0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n6n ( 2) ==>aloadi
I didn't see a crash though; seems to succeed compiling.
I dug around the code but I'm not too familiar with how things are laid out so I can't offer any more insights; as far as I can tell, there's nothing wrong with the validation record generated. My guess is that there's probably some inconsistency associated with the notions of unresolved and AOT. Traditionally we always had this implicit coupling with AOT and unresolvedness, but that isn't true with the SVM enabled so it's possible that we've entered a code path where some structures think something is resolved but others think it's unresolved (for no good reason other than the implicit "is AOT").
Just to mention that since I switched our pre-production from Semeru 21.0.2.0-m2 to Semeru 21.0.1 final, everything runs smoothly, not only I no more experience the GPF on our TomEE / webapps, but also our Apache Storm topologies are running very smoothly (which wasn't quite the case with 21.0.2.0-m2). All this on Linux x64 / RedHat 8.7 with same code both on web app & on Storm topologies side... I hope Semeru 21.0.2 Final will be as stable as 2.1.0.1 ...
21.0.2 will be about the same as 21.0.1, OpenJ9 wise. The OpenJDK part will be updated to 21.0.2, which is expected tomorrow. The M2 builds had a problem, SecureRandom wasn't working, plus since M2 we partly fixed an assertion caused by object corruption which showed up in a class loading test. 21.0.2 will have a better fix for that. Presumably the problem that caused this GPF still exists, but it may be a rare occurrence.
@pshipton thanks for insights on 21.0.2. Will there be a M3 release of 21.0.2 (with updated OpenJDK part) before final ?
The typical process is to do Release Candidate builds once we have the new OpenJDK material, which aren't usually published since the GA should occur quickly afterwards.
There will be OpenJ9 open source builds you can take, which are the same code but OpenJDK rather than Mozilla derived cacerts.
Pls give the following a try. There is still a test failure to resolve on Linux s390 so there may be another change before the release.
Thanks! but I already have all day tomorrow booked on other duties, so I'll keep that for maybe next Monday or Tuesday...
Moving this forward since there is no fix yet and I don't see an outlook.
Given the problems we've had attempting to reproduce this problem, the lack of core files to help with the investigation, and the fact that the problem seems not to have occurred for the reporter recently, I'm going to move this to the 0.45 release.
Hello,
For your information, we reproduce this issue with openj9-0.43.0.
More precisely:
java -version
openjdk version "21.0.2" 2024-01-16 LTS
IBM Semeru Runtime Open Edition 21.0.2.0 (build 21.0.2+13-LTS)
Eclipse OpenJ9 VM 21.0.2.0 (build openj9-0.43.0, JRE 21 Linux amd64-64-Bit Compressed References 20240116_94 (JIT enabled, AOT enabled)
OpenJ9 - 2c3d78b48
OMR - ea8124dbc
JCL - 78c4500a434 based on jdk-21.0.2+13)
on RedHat Linux 8.7
With the following stack on STDOUT:
#0: /usr/local/jdk/lib/default/libj9jit29.so(+0x918885) [0x7f525dbf1885]
#1: /usr/local/jdk/lib/default/libj9jit29.so(+0x924c40) [0x7f525dbfdc40]
#2: /usr/local/jdk/lib/default/libj9jit29.so(+0x1606c9) [0x7f525d4396c9]
#3: /usr/local/jdk/lib/default/libj9prt29.so(+0x29948) [0x7f525fbbd948]
#4: /lib64/libpthread.so.0(+0x12cf0) [0x7f525f6d4cf0]
#5: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d22d0) [0x7f525d8ab2d0]
#6: /usr/local/jdk/lib/default/libj9jit29.so(+0x5d39a3) [0x7f525d8ac9a3]
#7: /usr/local/jdk/lib/default/libj9jit29.so(+0x770a16) [0x7f525da49a16]
#8: /usr/local/jdk/lib/default/libj9jit29.so(+0x770d8a) [0x7f525da49d8a]
#9: /usr/local/jdk/lib/default/libj9jit29.so(+0x770fca) [0x7f525da49fca]
#10: /usr/local/jdk/lib/default/libj9jit29.so(+0x7999e5) [0x7f525da729e5]
#11: /usr/local/jdk/lib/default/libj9jit29.so(+0x79b6d3) [0x7f525da746d3]
#12: /usr/local/jdk/lib/default/libj9jit29.so(+0x5a63d3) [0x7f525d87f3d3]
#13: /usr/local/jdk/lib/default/libj9jit29.so(+0x58b14c) [0x7f525d86414c]
#14: /usr/local/jdk/lib/default/libj9jit29.so(+0x176a2f) [0x7f525d44fa2f]
#15: /usr/local/jdk/lib/default/libj9jit29.so(+0x177a61) [0x7f525d450a61]
#16: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7f525fbbe399]
#17: /usr/local/jdk/lib/default/libj9jit29.so(+0x1755a2) [0x7f525d44e5a2]
#18: /usr/local/jdk/lib/default/libj9jit29.so(+0x1758e8) [0x7f525d44e8e8]
#19: /usr/local/jdk/lib/default/libj9jit29.so(+0x17480b) [0x7f525d44d80b]
#20: /usr/local/jdk/lib/default/libj9jit29.so(+0x174b72) [0x7f525d44db72]
#21: /usr/local/jdk/lib/default/libj9jit29.so(+0x174c22) [0x7f525d44dc22]
#22: /usr/local/jdk/lib/default/libj9prt29.so(+0x2a399) [0x7f525fbbe399]
#23: /usr/local/jdk/lib/default/libj9jit29.so(+0x174feb) [0x7f525d44dfeb]
#24: /usr/local/jdk/lib/default/libj9thr29.so(+0xb3a3) [0x7f525fb863a3]
#25: /lib64/libpthread.so.0(+0x81ca) [0x7f525f6ca1ca]
#26: /lib64/libc.so.6(clone+0x43) [0x7f525f132e73]
Unhandled exception
Type=Segmentation error vmState=0x00053cff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F525E59A7D0 Handler2=00007F525FBBD710 InaccessibleAddress=0000000000000008
RDI=00007F522BC19260 RSI=0000000000000000 RAX=0000000000000000 RBX=0000000000000000
RCX=00007F522BC19600 RDX=0000000000000000 R8=0000000000000000 R9=00007F525D4EB7D0
R10=000000000004B000 R11=00007F522AB82C20 R12=0000000000000000 R13=00007F522BC19220
R14=00000000CA4587E7 R15=00007F525DDB25DC
RIP=00007F525D8AB2D0 GS=0000 FS=0000 RSP=00007F522BC191E8
EFlags=0000000000010206 CS=0033 RBP=00007F522BC19220 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 00007f522bc1e780 (f: 734128000.000000, d: 6.916469e-310)
xmm1 00007f522ab22000 (f: 716316672.000000, d: 6.916468e-310)
xmm2 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm3 00007f522ab436d0 (f: 716453568.000000, d: 6.916468e-310)
xmm4 00007f522ab43820 (f: 716453888.000000, d: 6.916468e-310)
xmm5 000001bc000001bc (f: 444.000000, d: 9.421661e-312)
xmm6 00007f522ab43640 (f: 716453440.000000, d: 6.916468e-310)
xmm7 00007f522ab43580 (f: 716453248.000000, d: 6.916468e-310)
xmm8 00007f522ab43790 (f: 716453760.000000, d: 6.916468e-310)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 421f0918152e090b (f: 355338496.000000, d: 3.332414e+10)
xmm11 0000015200000151 (f: 337.000000, d: 7.172346e-312)
xmm12 0000013d00000140 (f: 320.000000, d: 6.726727e-312)
xmm13 000001380000013f (f: 319.000000, d: 6.620627e-312)
xmm14 0000000008001800 (f: 134223872.000000, d: 6.631540e-316)
xmm15 000001420000013b (f: 315.000000, d: 6.832826e-312)
Module=/usr/local/jdk/lib/default/libj9jit29.so
Module_base_address=00007F525D2D9000
Method_being_compiled=sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
Target=2_90_20240116_94 (Linux 4.18.0-425.19.2.el8_7.x86_64)
CPU=amd64 (2 logical CPUs) (0x1e5108000 RAM)
----------- Stack Backtrace -----------
_ZNK2TR13NodeChecklist8containsEPNS_4NodeE+0x0 (0x00007F525D8AB2D0 [libj9jit29.so+0x5d22d0])
_ZN2TR20PreorderNodeIterator11stepForwardEv+0x43 (0x00007F525D8AC9A3 [libj9jit29.so+0x5d39a3])
_ZN18TR_ColdBlockMarker11isBlockColdEPN2TR5BlockE+0xb6 (0x00007F525DA49A16 [libj9jit29.so+0x770a16])
_ZN18TR_ColdBlockMarker18identifyColdBlocksEv+0x9a (0x00007F525DA49D8A [libj9jit29.so+0x770d8a])
_ZN18TR_ColdBlockMarker7performEv+0x3a (0x00007F525DA49FCA [libj9jit29.so+0x770fca])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x855 (0x00007F525DA729E5 [libj9jit29.so+0x7999e5])
_ZN3OMR9Optimizer8optimizeEv+0x1b3 (0x00007F525DA746D3 [libj9jit29.so+0x79b6d3])
_ZN3OMR20ResolvedMethodSymbol5genILEP11TR_FrontEndPN2TR11CompilationEPNS3_20SymbolReferenceTableERNS3_12IlGenRequestE+0x343 (0x00007F525D87F3D3 [libj9jit29.so+0x5a63d3])
_ZN3OMR11Compilation7compileEv+0x25c (0x00007F525D86414C [libj9jit29.so+0x58b14c])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4bf (0x00007F525D44FA2F [libj9jit29.so+0x176a2f])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x381 (0x00007F525D450A61 [libj9jit29.so+0x177a61])
omrsig_protect+0x239 (0x00007F525FBBE399 [libj9prt29.so+0x2a399])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x372 (0x00007F525D44E5A2 [libj9jit29.so+0x1755a2])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x128 (0x00007F525D44E8E8 [libj9jit29.so+0x1758e8])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x35b (0x00007F525D44D80B [libj9jit29.so+0x17480b])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007F525D44DB72 [libj9jit29.so+0x174b72])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007F525D44DC22 [libj9jit29.so+0x174c22])
omrsig_protect+0x239 (0x00007F525FBBE399 [libj9prt29.so+0x2a399])
_Z21compilationThreadProcPv+0x17b (0x00007F525D44DFEB [libj9jit29.so+0x174feb])
thread_wrapper+0x163 (0x00007F525FB863A3 [libj9thr29.so+0xb3a3])
start_thread+0xea (0x00007F525F6CA1CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F525F132E73 [libc.so.6+0x39e73])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2024/05/13 16:46:21 - please wait.
JVMDUMP032I JVM requested System dump using '/home/poduser/core.20240513.164621.95158.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 111387 was not found. Expected to find core file with name "/var/crash/core.111387"
JVMDUMP032I JVM requested Java dump using '/home/poduser/javacore.20240513.164621.95158.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/poduser/javacore.20240513.164621.95158.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/poduser/Snap.20240513.164621.95158.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/poduser/Snap.20240513.164621.95158.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/poduser/jitdump.20240513.164621.95158.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000018F00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
JVMDUMP010I JIT dump written to /home/poduser/jitdump.20240513.164621.95158.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".
and attached javacore.20240514.063244.8237.0002.txt, jitdump.20240514.063244.8237.0004.dmp.txt (rename because .dmp extension isn't allowed in this site) javacore.20240514.063244.8237.0002.txt jitdump.20240514.063244.8237.0004.dmp.txt Snap.20240514.063244.8237.0003.trc.txt , and Snap.20240514.063244.8237.0003.trc.txt (added .txt extension because .trc extension is not allowed for upload).
I don't know why the native core file couldn't be written.
I looks like we are reproducing this daily, would these new traces & files help finding out the issue's cause (and a potential by-pass) ?
Thanks, Alex
The symptom from jitdump.20240514.063244.8237.0004.dmp.txt looks to be the same as that described in the comment above — that is, the IL dumps for both the original failure and the recompilation break off at a ResolveCHK
operation:
n228n BBStart <block_25> [0x7f952598b700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n ResolveCHK [#351] [0x7f952598be30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>
Did you have a chance to look at the blog that @pshipton mentioned in an earlier comment (particularly the section on core_pattern
) to see whether that might help with gathering a core file?
If that doesn't help with producing a core file, may I ask you to try adding this option
-Xjit:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)'
to see whether we can get some additional information from a failing compilation? The drawback is that that will produce an additional log file even if the compile succeeds.
A potential bypass might be to use this option to prevent that method from being compiled at all, but we would really like to resolve this problem for you:
-Xjit:exclude={sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}
Hello @hzongaro ,
I have checked the limits documented at https://community.ibm.com/community/user/wasdevops/blogs/kevin-grigorenko1/2022/11/14/lessons-from-the-field-23-linux-core-dumps but I'm still not getting code dumps written.
I have added the option which you suggested to get additional logs, I see it produced constraints.log.* files.
Here the largest of these log file attached: constraints.log.1.288317.13970.20240518.063250.288317.zip
Does it help narrowing down the issue?
Thanks, Alex
Hello again,
Here are the files which I collected from yet another occurrence of this GPF crash, this time I have put in a single ZIP all relevant logs (which include the constraints.log.* files) so that's as comprehensive as possible: gpffiles.zip
On a side note, it would be nice if OpenJ9 could hint about why it cannot write core dump files : all limits which I checked have "nolimits", and file & directory write permissions are OK for all the directories which OpenJ9 tried to use to write the dump... couldn't OpenJ9 be a bit verbose about what prevent it from writing cores?
Thanks, Alex
And here's yet another collection of logs with IBM Semeru 21.0.3 Milestone 2 (I have the same GPF crash as with 21.0.2)
/mnt/jdk21.0.3m2/bin/java -version
openjdk version "21.0.3" 2024-04-16 LTS
IBM Semeru Runtime Open Edition 21.0.3.0-m2 (build 21.0.3+7-LTS)
Eclipse OpenJ9 VM 21.0.3.0-m2 (build v0.44.0-release-8b079b5c1, JRE 21 Linux amd64-64-Bit Compressed References 20240416_148 (JIT enabled, AOT enabled)
OpenJ9 - 8b079b5c1
OMR - 254af5a04
JCL - 5b5adb35189 based on jdk-21.0.3+7)
Here are the logs for the GPF: withJDK21.0.3m2.zip
There usually is some explanation in the console output, do you have the console output from when the crash occurs?
The console output from earlier was as follows.
JVMDUMP032I JVM requested System dump using '/home/poduser/core.20240513.164621.95158.0001.dmp' in response to an event
JVMDUMP012E Error in System dump: The core file created by child process with pid = 111387 was not found. Expected to find core file with name "/var/crash/core.111387"
Does the current user have permission to write to /var/crash? Is there enough space there? Any reason why the core couldn't be written to /home/poduser
?
What is the cat /proc/sys/kernel/core_pattern
?
Hello @pshipton,
I have added a 'chmod 777 /var/crashes', but strangely this time I got core dumped into another directory.
Problem is that core files are ~1,5GB large, and I guess I cannot upload such large files using "attach files" here...
I have tried this:
root@ip-172-31-39-56 temp]# gdb /mnt/jdk21.0.3m2/bin/java core.20240521.082437.1187691.0001.dmp
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/jdk21.0.3m2/bin/java...Missing separate debuginfo for /mnt/jdk21.0.3m2/bin/java
(no debugging symbols found)...done.
warning: Can't open file (null) during file-backed mapping note processing
warning: core file may not match specified executable file.
[New LWP 1620700]
Missing separate debuginfo for /usr/local/jdk/bin/../lib/libjli.so
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /usr/local/jdk/lib/server/libjvm.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libjvm.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libomrsig.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9vm29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9prt29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9thr29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9hookable29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9zlib29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9dmp29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9trc29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9jit29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9gc29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9vrb29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9shr29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libj9jvmti29.so
Missing separate debuginfo for /usr/local/jdk/lib/default/libjclse29.so
Missing separate debuginfo for /usr/local/jdk/lib/libjimage.so
Missing separate debuginfo for /usr/local/jdk/lib/libinstrument.so
Missing separate debuginfo for /usr/local/jdk/lib/libjava.so
Missing separate debuginfo for /usr/local/jdk/lib/libzip.so
Missing separate debuginfo for /usr/local/jdk/lib/libnio.so
Missing separate debuginfo for /usr/local/jdk/lib/libnet.so
Missing separate debuginfo for /usr/local/jdk/lib/libmanagement.so
Missing separate debuginfo for /usr/local/jdk/lib/libjncrypto.so
Missing separate debuginfo for /usr/local/jdk/lib/libextnet.so
Core was generated by `/usr/local/jdk/bin/java -Djava.util.logging.config.file=/usr/local/tomeeUsage/c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f890a0eda35 in pthread_kill () from /lib64/libpthread.so.0
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-236.el8_9.12.x86_64 libblkid-2.32.1-43.el8.x86_64 libgcc-8.5.0-16.el8_7.x86_64 libmount-2.32.1-43.el8.x86_64 libselinux-2.9-6.el
.x86_64 libstdc++-8.5.0-16.el8_7.x86_64 libuuid-2.32.1-43.el8.x86_64 openssl-libs-1.1.1k-9.el8_7.x86_64 pcre2-10.32-3.el8_6.x86_64 systemd-libs-239-78.el8.x86_64 zlib-1.2.11-21.el8_7.x86_64
(gdb) bt
#0 0x00007f890a0eda35 in pthread_kill () from /lib64/libpthread.so.0
#1 0x00007f890a5d8992 in omrdump_create () from /usr/local/jdk/lib/default/libj9prt29.so
#2 0x00007f8908c26b2f in doSystemDump () from /usr/local/jdk/lib/default/libj9dmp29.so
#3 0x00007f8908c227c5 in protectedDumpFunction () from /usr/local/jdk/lib/default/libj9dmp29.so
#4 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so
#5 0x00007f8908c25fb2 in runDumpFunction () from /usr/local/jdk/lib/default/libj9dmp29.so
#6 0x00007f8908c2613d in runDumpAgent () from /usr/local/jdk/lib/default/libj9dmp29.so
#7 0x00007f8908c3da75 in triggerDumpAgents () from /usr/local/jdk/lib/default/libj9dmp29.so
#8 0x00007f8908fb63d2 in generateDiagnosticFiles () from /usr/local/jdk/lib/default/libj9vm29.so
#9 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so
#10 0x00007f8908fb655a in vmSignalHandler () from /usr/local/jdk/lib/default/libj9vm29.so
#11 0x00007f890a5d9948 in mainSynchSignalHandler () from /usr/local/jdk/lib/default/libj9prt29.so
#12 <signal handler called>
#13 0x00007f88f69473b0 in TR::NodeChecklist::contains(TR::Node*) const () from /usr/local/jdk/lib/default/libj9jit29.so
#14 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward() () from /usr/local/jdk/lib/default/libj9jit29.so
#15 0x00007f88f6ae5af6 in TR_ColdBlockMarker::isBlockCold(TR::Block*) () from /usr/local/jdk/lib/default/libj9jit29.so
#16 0x00007f88f6ae5e6a in TR_ColdBlockMarker::identifyColdBlocks() () from /usr/local/jdk/lib/default/libj9jit29.so
#17 0x00007f88f6ae60aa in TR_ColdBlockMarker::perform() () from /usr/local/jdk/lib/default/libj9jit29.so
#18 0x00007f88f6b0eac5 in OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int) [clone .localalias] () from /usr/local/jdk/lib/default/libj9jit29.so
#19 0x00007f88f6b107b3 in OMR::Optimizer::optimize() () from /usr/local/jdk/lib/default/libj9jit29.so
#20 0x00007f88f691b4b3 in OMR::ResolvedMethodSymbol::genIL(TR_FrontEnd*, TR::Compilation*, TR::SymbolReferenceTable*, TR::IlGenRequest&) () from /usr/local/jdk/lib/default/libj9jit29.so
#21 0x00007f88f690022c in OMR::Compilation::compile() () from /usr/local/jdk/lib/default/libj9jit29.so
#22 0x00007f88f64eba2f in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&) ()
from /usr/local/jdk/lib/default/libj9jit29.so
#23 0x00007f88f64eca61 in TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*) () from /usr/local/jdk/lib/default/libj9jit29.so
#24 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so
#25 0x00007f88f64ea5a2 in TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR_MethodToBeCompiled*, J9::J9SegmentProvider&) () from /usr/local/jdk/lib/default/libj9jit29.so
#26 0x00007f88f64ea8e8 in TR::CompilationInfoPerThread::processEntry(TR_MethodToBeCompiled&, J9::J9SegmentProvider&) () from /usr/local/jdk/lib/default/libj9jit29.so
#27 0x00007f88f64e980b in TR::CompilationInfoPerThread::processEntries() () from /usr/local/jdk/lib/default/libj9jit29.so
#28 0x00007f88f64e9b72 in TR::CompilationInfoPerThread::run() () from /usr/local/jdk/lib/default/libj9jit29.so
#29 0x00007f88f64e9c22 in protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*) () from /usr/local/jdk/lib/default/libj9jit29.so
#30 0x00007f890a5da399 in omrsig_protect () from /usr/local/jdk/lib/default/libj9prt29.so
#31 0x00007f88f64e9feb in compilationThreadProc(void*) () from /usr/local/jdk/lib/default/libj9jit29.so
#32 0x00007f890a5a23a3 in thread_wrapper () from /usr/local/jdk/lib/default/libj9thr29.so
#33 0x00007f890a0e61ca in start_thread () from /lib64/libpthread.so.0
#34 0x00007f8909b4ee73 in clone () from /lib64/libc.so.6
do you see further gdb commands which I should run against this core file so you can get more insight of where the crash comes from ?
Were the constraints.log.* file useful to anything ?
Thanks, Alexandre
We can see what Henry needs. Also you could use a file sharing service and then share the link either here or privately.
The core file should compress quite a bit.
@pshipton using 'gzip -9' I was able to reduce core file size down to 214 MB, but I don't know where to upload it for sharing it.
But when I asked if constraints.log.* did help you mentioned "We can see what Henry needs" => do it mean that you have enough data to understand the GPF's cause without the core ?
Can you transfer the core file to me via Slack? Otherwise googling "file sharing" suggests a number of sites that can share files.
I mean we need input from Henry, I don't know what is required, but having the core file is usually beneficial.
Okay let's see if @hzongaro has enough with contraints.log.* files and otherwise I'll see how to use one of the file sharing.
When you mention Slack for me sending you the gziped core file, is that for example OpenJ9's slack ?
When you mention Slack for me sending you the gziped core file, is that for example OpenJ9's slack ?
Correct, you should be able to open a private chat with me and drop the file in.
I've received the core and jpackcore files and shared them on Box with Henry.
@avermeer, thank you for gathering those log and core files! I'll start looking at them to determine whether there's anything else we'll need to resolve the problem.
@a7ehuo, may I ask you to look at this problem? I've shared with you a link to the Box folder that Peter mentioned above.
I looked at the core.20240521.082437.1187691.0001.dmp
with jdk-21.0.3+7
and its debug image. Somehow I don't see the full backtrace.
(gdb) bt
#0 0x00007f890a0eda35 in ?? ()
#1 0x00007f890a689480 in newPath () from /usr/local/jdk/lib/default/libjvm.so
#2 0x00007f88d1fb4869 in ?? ()
#3 0x00007f88d1fb4850 in ?? ()
#4 0x00007f890a5d8992 in omrdump_create (portLibrary=0x7f890a689480 <j9portLibrary>,
filename=0xb <error: Cannot access memory at address 0xb>, dumpType=<optimized out>, userData=<optimized out>)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/port/unix/omrosdump.c:182
#5 0x00007f8908c26b2f in enableDumpOnOutOfMemoryError (agentOpts=0x18badc, agentNum=0x18badc)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/openj9/runtime/rasdump/dmpmap.c:541
#6 0x00007f88b0ac3be8 in ?? ()
#7 0x00007f88b288ad78 in ?? ()
#8 0x00007f88b4000000 in ?? ()
#9 0x80102001eca6239b in ?? ()
#10 0x000000000000002b in ?? ()
#11 0x0000000000000000 in ?? ()
@avermeer, thank you for capturing all the logs! One thing I noticed is that constraints.log.*
captured the JIT compilation instead of AOT compilation. Could you add the following options to the java command next time to capture both logs along with the core file? The concerned compilation so far is AOT but it's better to capture both just in case.
-Xaot:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)' -Xjit:'{sun/security/util/DisabledAlgorithmConstraints?Constraints.permits*}(log=constraints.log,traceILGen,traceBC,traceFull)'
Hello @a7ehuo , please find attached the resulting constraints.log.* files: constraintslog.zip
@hzongaro found the core was actually generated from 21.0.1+12
. I thought it was generated from jdk-21.0.3+7
based on the previous comment. I'm able to get the backtrace now. Along with the AOT compiled log, I narrowed down the issue to TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof.
JAVA VM NAME - Eclipse OpenJ9 VM
JAVA VM VERSION - openj9-0.42.0
JAVA SERVICE LEVEL INFO
JRE 21 Linux amd64-64 (build 21.0.1+12-LTS)
JAVA PRODUCT INFO
IBM Semeru Runtime Open Edition
JAVA VERSION INFO
JRE 21 Linux amd64-64-Bit Compressed References 20231017_77 (JIT enabled, AOT enabled)
OpenJ9 - 874af12ee
OMR - 11700e64f
JCL - cb91cd755e3 based on jdk-21.0.1+12
The crash happened in TR_ColdBlockMarker::isBlockCold while it iterates through all the trees in the block [1]. When it processes ResolveCHK
node [2], its first child is NULL [3]. The block block_25
that contains the ResolveCHK
is created by TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof [4]. The first child of the ResolveCHK
is supposed to be a class object node.
The newInstanceofTree
should be duplicated from n57n ZEROCHK
[5], based on the transformation comments, resultStoreNode->getFirstChild()->getSecondChild()
should give the class object node. Not sure at the moment why classNode
is NULL here. The investigation is ongoing.
+-<block_$nonnull>-----------------------+ v
| ResolveCHK | +-<block_$null>----------+
| loadaddr $klass [unresolved Static] | | istore $tmp_result |
| istore $tmp_result | | iconst 0 |
| instanceof | | goto --> block_$merge |
| aload $tmp_obj | +------------------------+
| =>loadaddr | |
+----------------------------------------+
[1]
(gdb) bt
#12 <signal handler called>
#13 OMR::Node::getGlobalIndex (this=<optimized out>)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/Checklist.cpp:52
#14 TR::NodeChecklist::contains (this=0x7f88d1fb7260, n=0x0)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/Checklist.cp--Type <RET> for more, q to quit, c to continue without paging--
p:52
#15 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward (this=0x7f88d1fb7220)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.cpp:181
#16 0x00007f88f6ae5af6 in TR::PreorderNodeIterator::operator++ (this=0x7f88d1fb7220)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.hpp:175
#17 TR_ColdBlockMarker::isBlockCold (this=0x7f8862a0d220, block=0x7f8862a6f200)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8600
#18 0x00007f88f6ae5e6a in TR_ColdBlockMarker::identifyColdBlocks (this=0x7f8862a0d220)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8558
#19 0x00007f88f6ae60aa in TR_ColdBlockMarker::perform (this=0x7f8862a0d220)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8517
#20 0x00007f88f6b0eac5 in OMR::Optimizer::performOptimization (this=0x7f8862a6fa10, optimization=<optimized out>,
firstOptIndex=<optimized out>, lastOptIndex=<optimized out>, doTiming=<optimized out>)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/OMROptimizer.cpp:2057
#21 0x00007f88f6b107b3 in OMR::Optimizer::optimize (this=this@entry=0x7f8862a6fa10)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/OMROptimizer.cpp:1121
#22 0x00007f88f691b4b3 in OMR::ResolvedMethodSymbol::genIL (this=0x7f8862a028e0, fe=0x7f88c4008010,
comp=comp@entry=0x7f88629fd000, symRefTab=0x7f88629fed70, customRequest=...)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/il/OMRResolvedMethodSymbol.cpp:1270
#23 0x00007f88f690022c in OMR::Compilation::compile (this=this@entry=0x7f88629fd000)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/compile/OMRCompilation.hpp:324
[2]
n228n BBStart <block_25> (freq 119) [0x7fd18b093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n ResolveCHK [#351] [0x7fd18b093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</ilOfCrashedThread>
[3]
(gdb) fr 15
#15 0x00007f88f6948a83 in TR::PreorderNodeIterator::stepForward (this=0x7f88d1fb7220)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/infra/ILWalk.cpp:181
181 if (!alreadyBeenPushed(child))
(gdb) p child
$11 = (TR::Node *) 0x0
(gdb) fr 17
#17 TR_ColdBlockMarker::isBlockCold (this=0x7f8862a0d220, block=0x7f8862a6f200)
at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-x64-openj9/workspace/build/src/omr/compiler/optimizer/LocalOpts.cpp:8600
8600 // some unresolved symbols in it, then it must be a cold block
(gdb) list -1
8594 //
8595 for (TR::PreorderNodeIterator iter(block->getFirstRealTreeTop(), comp()); iter != block->getExit(); ++iter)
8596 {
8597 TR::Node *node = iter.currentNode();
8598
8599 // If the block throws an exception, and exceptions are rare in this method, or if the block has
8600 // some unresolved symbols in it, then it must be a cold block
8601 //
8602 // NOTE: return is not considered a cold block: in a loopless methods, a return postdominates
8603 // everything.
(gdb) p block->_pEntry->_pNode->_opCode->_opCode
$1 = TR::BBStart
(gdb) p block->_pEntry->_pNext->_pNode->_opCode->_opCode
$2 = TR::ResolveCHK
(gdb) p block->_pEntry->_pNext->_pNode
$3 = (TR::Node *) 0x7f8862a7e9f0
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_numChildren
$4 = 1
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->__flags
There is no member or method named __flags.
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_flags
$5 = {_flags = 0}
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_opCode->_opCode
$6 = TR::ResolveCHK
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_unionBase._children[0]
$7 = (TR::Node *) 0x0
(gdb) p ((TR::Node *) 0x7f8862a7e9f0)->_unionBase._children
$8 = {0x0, 0x0}
[4]
expanding unresolved class instanceof n55n in block_3
create storeNode 00007FC24C6797A0 of tempSymRef #471 (possibly for node uncommoning during opcodeExpansion)
create storeNode 00007FC24C6797F0 of tempSymRef #472 (possibly for node uncommoning during opcodeExpansion)
create storeNode 00007FC24C679840 of tempSymRef #473 (possibly for node uncommoning during opcodeExpansion)
create storeNode 00007FC24C679890 of tempSymRef #474 (possibly for node uncommoning during opcodeExpansion)
result in temp #475
block_25: resolve, instanceof
block_27: false
block_26: tail of original block
genIL() returned 1
<trees
title="Pre IlGenOpt Trees"
method="sun/security/util/DisabledAlgorithmConstraints$Constraints.permits(Ljava/lang/String;Lsun/security/util/ConstraintsParameters;Z)V"
hotness="warm">
[5]
/--- trees inserted ------------------------
n51n ( 0) NULLCHK on n46n [#32] [0x7fc24c5eaa30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n ( 1) PassThrough [0x7fc24c5ea9e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n ( 4) aload <parm 2 Lsun/security/util/ConstraintsParameters;>[#423 Parm] [flags 0x40000107 0x0 ] [0x7fc24c5ea8a0] bci=[-1,32,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n57n ( 0) ZEROCHK [#54] [0x7fc24c676190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n ( 2) instanceof jitInstanceOf[#87 helper Method] [flags 0x400 0x0 ] [0x7fc24c6760f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n46n ( 4) ==>aload
n53n ( 1) loadaddr unknown class object[#436 unresolved Static] [flags 0x18307 0x0 ] [0x7fc24c676050] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0
Yes I have reproduced this issue with 21.0.1, 21.0.2 and 21.0.3 milestone ; sorry if I confused you between my various runs/outputs
Hello, is there a tentative fix which I could test?
Sorry, not yet. Unfortunately, this week I haven't been able to get back to this issue. Hopefully I'll be able to continue the investigation later this week. I'll keep you posted once I have something to try out. Thank you for following up!
I think I found the problem. The following is the generated trees for invokeinterface
on java/lang/Object.toString
n51n ( 0) NULLCHK on n46n [#32] [0x7fc24c5eaa30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n ( 1) PassThrough [0x7fc24c5ea9e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n ( 4) aload <parm 2 Lsun/security/util/ConstraintsParameters;>[#423 Parm] [flags 0x40000107 0x0 ] [0x7fc24c5ea8a0] bci=[-1,32,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n57n ( 0) ZEROCHK [#54] [0x7fc24c676190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n ( 2) instanceof jitInstanceOf[#87 helper Method] [flags 0x400 0x0 ] [0x7fc24c6760f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n46n ( 4) ==>aload
n53n ( 1) loadaddr unknown class object[#436 unresolved Static] [flags 0x18307 0x0 ] [0x7fc24c676050] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0
n49n ( 0) treetop [0x7fc24c5ea990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n ( 1) acalli java/lang/Object.toString()Ljava/lang/String;[#435 virtual Method -56] [flags 0x500 0x0 ] [0x7fc24c5ea940] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=2
n47n ( 1) aloadi <vft-symbol>[#346 Shadow] [flags 0x18607 0x0 ] [0x7fc24c5ea8f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n ( 4) ==>aload
n46n ( 4) ==>aload
n56n ( 0) treetop [0x7fc24c676140] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n ( 2) ==>instanceof
Because the class object for instanceof
is unresolved, genInstanceof
anchors instanceof
(n55n
) under a treetop
node (n56n
)
https://github.com/eclipse-openj9/openj9/blob/ef127697cd5dc6201b6d6ae39bb3f0d0587ceb1c/runtime/compiler/ilgen/Walker.cpp#L2341-L2345.
At the end of genInvokeInterface
, the treetop (n56n
) that anchors the instanceof
(n55n
) appears after the call tree. The anchored instanceof
(n55n
) commons the previous instanceof
under the ZEROCHK
(n57n
). This causes a problem for TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof which happens later.
In expandUnresolvedClassInstanceof
, it needs to split the block at the treetop (n56n
) that anchors the instanceof
(n55n
). Because the instanceof
under the treetop (n56n
) is not the first time the node appears, in order to fix commoning, the block split will have to create a temp after ZEROCHK
to store the value of instanceof
. And instanceof
(n55n
) would be replaced with an iload
from the temp. See an example below.
n15n ZEROCHK [#54]
n13n instanceof jitInstanceOf
n5n ==>aload
n12n loadaddr SomeInterface
n74n istore <temp slot 3>[#434 Auto] //<-- fix commoning
n13n ==>instanceof
n8n treetop
n7n acalli java/lang/Object.toString()Ljava/lang/String;
n6n aloadi <vft-symbol>[#346 Shadow]
n5n ==>aload
n5n ==>aload
n73n astore <temp slot 2>
n7n ==>acalli
n70n treetop
n5n ==>aload
n72n BBEnd </block_2> =====
n71n BBStart <block_6>
n14n treetop
n76n iload <temp slot 3>[#434 Auto] // load from the temp on the value of instanceof
expandUnresolvedClassInstanceof
doesn't expect this and still thinks it is dealing with the original instanceof
node. So when it tries to get the class object node from the instanceof
for the ResolveCHK
, it tries to get the second child from the iload
that has replaced the original instanceof
. The iload
doesn't have a second child, therefore, it crashes right after ResolveCHK
.
@avermeer How to send you a potential fix to test? If you are on OpenJ9 slack channel, maybe I could send a debug build on Slack?
Hi @a7ehuo yes I am on OpenJ9 slack channel, with "ave" identifier - I already shared the cores with @pshipton using this channel. To clarify, I need Linux x64 (not aarch64, not Windows) so as to efficiently test.
If you want me to build it using the openj9 jenkins, I can do that. Then the binary will be available for download from artifactory to anyone. Just provide the branches to build.
If you want me to build it using the openj9 jenkins, I can do that.
Here is the branch info. Thank you very much @pshipton!
OPENJ9_REPO: git@github.com:a7ehuo/openj9.git
OPENJ9_BRANCH: fix-ilgen-instanceof-2
Thank you @pshipton!
@avermeer Could you give the debug build a try? Thank you!
Hello @a7ehuo and @pshipton , thank you very much for this debug build. I applied it to my pre-preproduction, after 6 hours of observations, I have no GPF crash on the webapp that used to have crashes within less than 5 hours of execution, so this sounds very promising.
I will keep this setup until at least Monday so as to give definitive answer.
On a side note, I have another GPF on another webapp, that's taking place during another method compilation:
(Method_being_compiled=javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V)
so I first need to review existing OpenJ9 incidents so as to check if that's not an already know issue with Semeru 21.0.3 ; and if not, then I will open a dedicated incident (as it's not impacting the same method as current incident)
I have another GPF on another webapp, that's taking place during another method compilation:> > (Method_being_compiled=javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V)
@avermeer just double checking, for this cash, is this running the debug build from yesterday or the release build?
If it's not running the debug build, is there a jitdump.*.dmp log? Maybe I can take a quick look and see if it is the same issue.
If it's not related to this issue, adding the following option to capture more logs may help further investigation when you open a new issue. Thank you very much!
-Xaot:'{javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V}(log=aotlog,traceILGen,traceBC,traceFull)' -Xjit:'{javax/crypto/Mac.chooseProvider(Ljava/security/Key;Ljava/security/spec/AlgorithmParameterSpec;)V}(log=jitlog.log,traceILGen,traceBC,traceFull)'
@a7ehuo : the new crash on AlgorithmParameterSpec took place with released version of IBM Semeru Community Edition 21.0.3 for Linux x64, on a web application which differs from the one of current ticket (this setup was not based on the recent debug build).
See attached jitdump.*.dmp log file for this new crash so you may check if that's caused by same issue as the one of current incident: jitdump.20240530.162026.7896.0004.dmp
I have taken note of the options you propose to get more traces, but since this crash took place on a web app having 5 days and a half of uptime, I'm afraid it'll take quite some time before I guess results...
@avermeer I took a look at jitdump.20240530.162026.7896.0004.dmp. I believe it is the same issue: It crashed right in processing the first child of ResolveCHK
and the previous block block_26
shows same IL tree transformation done by TR_J9ByteCodeIlGenerator::expandUnresolvedClassInstanceof involving a ZEROCHK
.
n190n BBStart <block_26> [0x7f59c4d6fb20] bci=[-1,215,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n237n NULLCHK on n231n [#32] [0x7f59c4d709d0] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n236n PassThrough [0x7f59c4d70980] bci=[-1,218,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n231n aload key<parm 1 Ljava/security/Key;>[#423 Parm] [flags 0x40000107 0x0 ] [0x7f59c4d707f0] bci=[-1,215,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n244n ZEROCHK [#54] [0x7f59c4d70c00] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n242n iload <temp slot 15>[#477 Auto] [flags 0x3 0x0 ] [0x7f59c4d70b60] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n364n istore <temp slot 14>[#476 Auto] [flags 0x3 0x0 ] [0x7f59c4d73180] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n242n ==>iload
n233n treetop [0x7f59c4d70890] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n232n aloadi <vft-symbol>[#346 Shadow] [flags 0x18607 0x0 ] [0x7f59c4d70840] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n231n ==>aload
n241n treetop [0x7f59c4d70b10] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n235n aloadi <javaLangClassFromClass>[#336 Shadow +48] [flags 0x607 0x0 ] [0x7f59c4d70930] bci=[-1,218,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n232n ==>aloadi
n363n astore <temp slot 13>[#475 Auto] [flags 0x7 0x0 ] [0x7f59c4d73130] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n235n ==>aloadi
n360n treetop [0x7f59c4d73040] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n231n ==>aload
n373n ifacmpeq --> block_45 BBStart at n370n () [0x7f59c4d73450] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n231n ==>aload
n372n aconst NULL (X==0 X>=0 X<=0 ) [0x7f59c4d73400] bci=[-1,218,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n362n BBEnd </block_26> ===== [0x7f59c4d730e0] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n361n BBStart <block_43> [0x7f59c4d73090] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n380n ResolveCHK [#351] [0x7f59c4d73680] bci=[-1,218,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>
Just for comparison, below is jitdump.20240111.224658.381973.0004.dmp from this issue
n51n NULLCHK on n46n [#32] [0x7fd18a004a30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n50n PassThrough [0x7fd18a0049e0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n aload cp<parm 2 Lsun/security/util/ConstraintsParameters;>[#423 Parm] [flags 0x40000107 0x0 ] [0x7fd18a0048a0] bci=[-1,32,-] rc=5 vc=0 vn=- li=- udi=- nc=0
n57n ZEROCHK [#54] [0x7fd18a090190] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n iload <temp slot 15>[#475 Auto] [flags 0x3 0x0 ] [0x7fd18a0900f0] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n231n istore <temp slot 12>[#472 Auto] [flags 0x3 0x0 ] [0x7fd18a0937f0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n55n ==>iload
n49n treetop [0x7fd18a004990] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n acalli java/lang/Object.toString()Ljava/lang/String;[#435 virtual Method -56] [flags 0x500 0x0 ] [0x7fd18a004940] bci=[-1,35,-] rc=2 vc=0 vn=- li=- udi=- nc=2
n47n aloadi <vft-symbol>[#346 Shadow] [flags 0x18607 0x0 ] [0x7fd18a0048f0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n46n ==>aload
n46n ==>aload
n230n astore <temp slot 11>[#471 Auto] [flags 0x7 0x0 ] [0x7fd18a0937a0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n48n ==>acalli
n227n treetop [0x7fd18a0936b0] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n46n ==>aload
n244n ifacmpeq --> block_27 BBStart at n241n () [0x7fd18a093c00] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n46n ==>aload
n243n aconst NULL (X==0 X>=0 X<=0 ) [0x7fd18a093bb0] bci=[-1,35,-] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x302
n229n BBEnd </block_3> ===== [0x7fd18a093750] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n228n BBStart <block_25> [0x7fd18a093700] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n251n ResolveCHK [#351] [0x7fd18a093e30] bci=[-1,35,-] rc=0 vc=0 vn=- li=- udi=- nc=1
</recompilation rc=0 queued=1>
</jitDump>
Java -version output
openjdk version "21.0.1" 2023-10-17 LTS IBM Semeru Runtime Open Edition 21.0.2.0-m2 (build 21.0.1+12-LTS) Eclipse OpenJ9 VM 21.0.2.0-m2 (build v0.43.0-release-54becab46, JRE 21 Linux amd64-64-Bit Compressed References 20231017_57 (JIT enabled, AOT enabled) OpenJ9 - 54becab46 OMR - d810fcb98 JCL - 10a4ed68a77 based on jdk-21.0.1+12)
Summary of problem
We have a GPF crash in our preproduction web application (using on top of TomEE 9.1.2 on Redhat Linux 8.7 on Linux x64) with Semeru Runtime Open Edition 21.0.2.0-m2. While we don't know the exact scenario to reproduce, the GPF crash took place on the two different VMs running our web application last night, and the night before, with same type of dumps.
The issue seems to take place when JIT compiler is trying compile sun/security/util/DisabledAlgorithmConstraints$Constraints.permits, given this output which we found in catalina.out:
Please note that since IBM Semeru 21.0.1 has been officially released, I am going to switch to this version - I will let you know if the GPF issue takes place on it too (but once I'll have moved to 21.0.1 official, I may not be able to switch back easily to 21.0.2-m2 if you need more details to find the root cause & fix it).
Diagnostic files
See attached .dmp file (.core file was nowhere...)
jitdump.20240111.224658.381973.0004.dmp
See attached .trc file (ZIPped):
Snap.20240111.224658.381973.0003.trc.zip
vmState [0x53cff]: {J9VMSTATE_JIT} {coldBlockMarker}