eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

The JDK11 of J9 reports crashes in test cases #15061

Closed JavaTailor closed 2 years ago

JavaTailor commented 2 years ago

Affected versions

We found a test case with crash problems. To facilitate analysis, we simplified the test case and the simplified class file can ben found at attachment.

Windows 10:

Microsoft Windows 10 Professional
10.0.19044 Build 19044
AMD Ryzen 5 5600G with Radeon Graphics 3.90 GHz
Memory 32 GB

Java -version output under Windows 10

openjdk version "1.8.0_332-internal"
OpenJDK Runtime Environment (build 1.8.0_332-internal-_2022_05_01_00_39-b00)
Eclipse OpenJ9 VM (build master-d5f1557, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20220501_000000 (JIT enabled, AOT enabled)
OpenJ9   - d5f1557
OMR      - 81b7940
JCL      - 16aeb67 based on jdk8u332-b09)
openjdk version "11.0.15-internal" 2022-04-19
OpenJDK Runtime Environment (build 11.0.15-internal+0-adhoc..openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-56fe4ec, JRE 11 Linux amd64-64-Bit Compressed References 20220424_000000 (JIT enabled, AOT enabled)
OpenJ9   - 56fe4ec
OMR      - db8e957
JCL      - 3a97e78 based on jdk-11.0.15+10)

Problem summary

We found that in a normal test case, if we executed the test case using J9-JDK11 it would report crash, but if we executed it using J9-JDK8 it would report nothing. Below is the information we executed using J9-JDK11, and our test case can be found in the attachment. To further verify this problem, we compiled OpenJ9 ourselves with the #6423 modification

Unhandled exception
Type=Segmentation error vmState=0x00050fff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FADE1B94770 Handler2=00007FADE11DD7D0 InaccessibleAddress=0000000000000010
RDI=0000000000000000 RSI=00007FADBCE27FB0 RAX=0000000000000000 RBX=00007FADBCC95020
RCX=0000000000000000 RDX=00007FADBCBA9550 R8=0000000000000001 R9=0000000000000031
R10=0000000000008000 R11=00007FADBCDF0920 R12=00007FADBCD9DF60 R13=00007FADBCE27FB0
R14=00007FADBCE96DF0 R15=00007FADBC945020
RIP=00007FADDB6C34DD GS=0000 FS=0000 RSP=00007FADBDA18000
EFlags=0000000000010202 CS=0033 RBP=00007FADBCE27FB0 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000010
xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 00000004a0010000 (f: 2684420096.000000, d: 9.814263e-314)
xmm3 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm4 4000038580000000 (f: 2147483648.000000, d: 2.001719e+00)
xmm5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6 00000004a0010000 (f: 2684420096.000000, d: 9.814263e-314)
xmm7 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 00007fadbcb65cf0 (f: 3166067968.000000, d: 6.935900e-310)
xmm14 00007fadbcb664e0 (f: 3166070016.000000, d: 6.935900e-310)
xmm15 00007fadbcb67270 (f: 3166073344.000000, d: 6.935900e-310)
Module=/home/ningmo/Openj9/jdk11/lib/default/libj9jit29.so
Module_base_address=00007FADDB0E3000

Method_being_compiled=TestCase4.test()V
Target=2_90_20220424_000000 (Linux 4.15.0-142-generic)
CPU=amd64 (4 logical CPUs) (0x2f2bc9000 RAM)
----------- Stack Backtrace -----------
_ZN16TR_LoopVersioner35detectCanonicalizedPredictableLoopsEP12TR_StructurePP12TR_BitVectori+0x5fd (0x00007FADDB6C34DD [libj9jit29.so+0x5e04dd])
_ZN16TR_LoopVersioner24performWithoutDominatorsEv+0xbe2 (0x00007FADDB6D5F92 [libj9jit29.so+0x5f2f92])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii+0x767 (0x00007FADDB71A9A7 [libj9jit29.so+0x6379a7])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii+0xcf9 (0x00007FADDB71AF39 [libj9jit29.so+0x637f39])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii+0xcf9 (0x00007FADDB71AF39 [libj9jit29.so+0x637f39])
_ZN3OMR9Optimizer8optimizeEv+0x1db (0x00007FADDB71C2EB [libj9jit29.so+0x6392eb])
_ZN3OMR11Compilation7compileEv+0x925 (0x00007FADDB510BB5 [libj9jit29.so+0x42dbb5])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4fa (0x00007FADDB2049CA [libj9jit29.so+0x1219ca])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x323 (0x00007FADDB2057D3 [libj9jit29.so+0x1227d3])
omrsig_protect+0x1e3 (0x00007FADE11DE533 [libj9prt29.so+0x29533])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x309 (0x00007FADDB2035C9 [libj9jit29.so+0x1205c9])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x207 (0x00007FADDB203C37 [libj9jit29.so+0x120c37])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x38b (0x00007FADDB20290B [libj9jit29.so+0x11f90b])
_ZN2TR24CompilationInfoPerThread3runEv+0x2a (0x00007FADDB202BFA [libj9jit29.so+0x11fbfa])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007FADDB202CC2 [libj9jit29.so+0x11fcc2])
omrsig_protect+0x1e3 (0x00007FADE11DE533 [libj9prt29.so+0x29533])
_Z21compilationThreadProcPv+0x1d2 (0x00007FADDB203102 [libj9jit29.so+0x120102])
thread_wrapper+0x162 (0x00007FADE1948322 [libj9thr29.so+0xe322])
start_thread+0xca (0x00007FADE25C06BA [libpthread.so.0+0x76ba])
clone+0x6d (0x00007FADE2CFB51D [libc.so.6+0x10751d])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/05/13 16:51:58 - please wait.
JVMDUMP032I JVM requested System dump using '/home/ningmo/myFile/TestCase/core.20220513.165158.6025.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.6048.

JVMDUMP010I System dump written to /home/ningmo/myFile/TestCase/core.20220513.165158.6025.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/ningmo/myFile/TestCase/javacore.20220513.165158.6025.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/ningmo/myFile/TestCase/javacore.20220513.165158.6025.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/ningmo/myFile/TestCase/Snap.20220513.165158.6025.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/ningmo/myFile/TestCase/Snap.20220513.165158.6025.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/ningmo/myFile/TestCase/jitdump.20220513.165158.6025.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x000000000001AC00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP010I JIT dump written to /home/ningmo/myFile/TestCase/jitdump.20220513.165158.6025.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail "".

Attachment

TestCase4.zip

pshipton commented 2 years ago

Duplicated with a recent nightly xlinux build. @hzongaro can someone take a look pls. I've added it to the 0.33 milestone plan.

hzongaro commented 2 years ago

It looks like the failure isn't exposed with the Java 8 build because of differences in the amount of inlining that happens.

The crash happens in TR_LoopVersioner::detectCanonicalizedPredictableLoops because _loopTestTree is NULL. The call to TR_LoopCanonicalizer::checkLoopForPredictability ends up not setting _loopTestTree because the region that it's looking at is no longer a loop due to an earlier transformation.

               86 [0x7fffd29afc90] Acyclic region
                  Subgraph: (* = exit edge)
                        (0x7fffd29afd60:0x7fffd29add70)86 --> 150(0x7fffd29b7b00)
                        (0x7fffd29afdf0:0x7fffd29af840)125 --> 84(0x7fffd29afed0)*
                        (0x7fffd29b7b00:0x7fffd29b7aa0)150 --> 125(0x7fffd29afdf0)
                     Exit edges:
                        (0x7fffd29afdf0)125 -->84
                  86 [0x7fffd29add70] Block
                  125 [0x7fffd29af840] Natural loop
                     Subgraph: (* = exit edge)
                           (0x7fffd29af910:0x7fffd29ad890)125 --> 83(0x7fffd29af9a0)
                           (0x7fffd29af9a0:0x7fffd29ade90)83 --> 84(0x7fffd29afbb0)* 125(0x7fffd29af910)
                        Exit edges:
                           (0x7fffd29af9a0)83 -->84
                     125 [0x7fffd29ad890] Block
                     83 [0x7fffd29ade90] Block
                  150 [0x7fffd29b7aa0] Block
               84 [0x7fffd29ade30] Block

I'm not sure whether the right thing to do is add an additional test of whether _loopTestTree is non-NULL, or avoid calling TR_LoopVersioner::detectCanonicalizedPredictableLoops for something that is not a loop, or have TR_LoopCanonicalizer::checkLoopForPredictability handle non-loops by not returning a positive result, or maybe something else.

In any event, Devin @jdmpapin, may I ask you to investigate this?

jdmpapin commented 2 years ago

Opened eclipse/omr#6531 with a fix

pshipton commented 2 years ago

Keep in mind it will need to be double delivered, if we think this is appropriate, to get into the 0.33 release.

jdmpapin commented 2 years ago

Opened eclipse-openj9/openj9-omr#147 for 0.33

pshipton commented 2 years ago

Closing as the fixes are merged.