eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

JDK17 ppc64le_linux : j9mm.107 ** ASSERTION FAILED ** at ../omr/gc/base/MemoryPoolAddressOrderedList.cpp:633: ((false && (freeEntrySize >= _minimumFreeEntrySize))) #13574

Open JasonFengJ9 opened 2 years ago

JasonFengJ9 commented 2 years ago

Failure link

From an internal build job/Test_openjdk17_j9_extended.openjdk_ppc64le_linux/20/ (ub20lert-1-7):

openjdk version "17" 2021-09-14
IBM Semeru Runtime Open Edition 17.0.0.0-m1 (build 17+35)
Eclipse OpenJ9 VM 17.0.0.0-m1 (build master-d6256c10f, JRE 17 Linux ppc64le-64-Bit Compressed References 20210925_42 (JIT enabled, AOT enabled)
OpenJ9   - d6256c10f
OMR      - 82faf49a1
JCL      - ff6a3e831b based on jdk-17+35)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2021-09-25T21:56:30.357Z] Running test jdk_vector_1 ...
[2021-09-25T21:56:30.357Z] ===============================================
[2021-09-25T21:56:30.357Z] jdk_vector_1 Start Time: Sat Sep 25 14:56:27 2021 Epoch Time (ms): 1632606987158
[2021-09-25T21:56:30.357Z] "/home/jenkins/workspace/Test_openjdk17_j9_extended.openjdk_ppc64le_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk17_j9_extended.openjdk_ppc64le_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
[2021-09-25T21:56:30.357Z] JVMSHRC005I No shared class caches available
[2021-09-25T21:56:30.357Z] JVMSHRC005I No shared class caches available
[2021-09-25T21:56:30.357Z] cache cleanup done
[2021-09-25T21:56:30.357Z] variation: Mode650
[2021-09-25T21:56:30.357Z] JVM_OPTIONS:  -XX:-UseCompressedOops 

[2021-09-25T22:26:18.658Z] --------------------------------------------------
[2021-09-25T22:26:18.658Z] TEST: jdk/incubator/vector/Float64VectorLoadStoreTests.java

[2021-09-25T22:26:18.660Z] STDERR:
[2021-09-25T22:26:18.660Z] WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
[2021-09-25T22:26:18.660Z] 22:26:06.530 0x239400    j9mm.107    *   ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/MemoryPoolAddressOrderedList.cpp:633: ((false && (freeEntrySize >= _minimumFreeEntrySize)))
[2021-09-25T22:26:18.660Z] JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/09/25 15:26:06 - please wait.

[2021-09-25T23:28:25.513Z] jdk_vector_1_FAILED

fyi @dmitripivkine

dmitripivkine commented 2 years ago

@JasonFengJ9 Do you know are Artifacts stored for this failure? I see link but it returns 404... May be it its wrong link

JasonFengJ9 commented 2 years ago

Jenkins Artifacts link is broken (runtimes/backlog/issues/594), use the link within the console output instead.

dmitripivkine commented 2 years ago

From Nursery subspace !MM_MemorySubSpaceGeneric 0x7276980aad60 Memory Pool !mm_memorypooladdressorderedlist 0x00007276980A5AF0 _heapFreeList = !mm_heaplinkedfreeheader 0x00000000FF8DA238

0xFF8DA200 :  00630065 006f0074 004c0072 0061006f [ e.c.t.o.r.L.o.a. ]
0xFF8DA210 :  00530064 006f0074 00650072 00650054 [ d.S.t.o.r.e.T.e. ]
0xFF8DA220 :  00740073 00400073 00000000 00000000 [ s.t.s.@......... ]
0xFF8DA230 :  00000000 00000000 00000001 00000000 [ ................ ] <---- Linked Free Header size 0
0xFF8DA240 :  00000000 00000000 00392b00 00000008 [ .........+9..... ]
0xFF8DA250 :  ff8da230 00000000 0012a000 00000000 [ 0............... ]
0xFF8DA260 :  00000000 00000000 00000000 00000000 [ ................ ]
0xFF8DA270 :  000b6c00 00000002 ff8da280 00000000 [ .l.............. ]
dmitripivkine commented 2 years ago

@0xdaryl FYI Looks like this is heap corruption. It seems has no relation (not applicable) to recent changes in Memory Pool code. So I have no idea how this zeros were written to the heap

dmitripivkine commented 2 years ago

So again, an assertion has been triggered because there is zero at 0xFF8DA240. This location happened to be Linked Free Header Size slot. This is not valid value There are two possibilities:

For first case I can not see how 0 can be written as a size in GC code. Minimum size to be written for Multi Slot Hole (0x1 at 0xFF8DA238) is 0x10. The size slot is protected and operated by MM_HeapLinkedFreeHeader functions only. It is plausible this is HW failure and real value has not been written.

For second case heap corruption there is not enough information to investigate, this is hopeless realistically.

The problem might manifest differently if it happen next time. If we see the same assertion again it might help to understand problem better

dmitripivkine commented 2 years ago

10 test iterations grinder /job/Grinder/18584/

dmitripivkine commented 2 years ago

10 test iterations grinder /job/Grinder/18584/

summary for grinder 10 test iterations on 5 machines (50 total):

0xdaryl commented 2 years ago

As this problem has only shown up on PPC LE (and seems readily reproducible), @zl-wang could you assign this for investigation please?

dmitripivkine commented 2 years ago

for Iteration 4 for the grinder /job/Grinder_iteration_4/398/consoleFull I have results downloaded to /team/Dmitri/13574/iteration4/. Summary of system core included:

./aqa-tests/TKG/output_16335049886092/jdk_vector_1/work/jdk/incubator/vector/Long64VectorLoadStoreTests/core.20211006.011318.27564.0001.dmp <-- assertion
./aqa-tests/TKG/output_16334994679967/jdk_vector_1/work/jdk/incubator/vector/Double64VectorLoadStoreTests/core.20211005.231026.18690.0001.dmp <-- assertion
./aqa-tests/TKG/output_16334663781698/jdk_vector_1/work/jdk/incubator/vector/Double128VectorLoadStoreTests/core.20211005.135420.12035.0001.dmp <-- vmState=0x00040000 Compiled_method=java/util/ArrayList.addAll(Ljava/util/Collection;)Z
./aqa-tests/TKG/output_16334774767826/jdk_vector_1/work/jdk/incubator/vector/Short64VectorLoadStoreTests/core.20211005.174549.28520.0001.dmp <-- vmState=0x00000000
./aqa-tests/TKG/output_16334884518017/jdk_vector_1/work/jdk/incubator/vector/Int64VectorLoadStoreTests/core.20211005.202807.7386.0001.dmp <-- vmState=0x00000000
./aqa-tests/TKG/output_16334828868042/jdk_vector_1/work/jdk/incubator/vector/Long64VectorLoadStoreTests/core.20211005.190642.1687.0001.dmp <-- assertion

I am going to look to assertions in GC cores

@pshipton @tajila @0xdaryl @zl-wang FYI There are multiple failures in jdk_vector_1 test on Linux PPC LE. I am not sure all these failures have the same root or occur on PPC LE only. There are a few cores to look at if you have resources. There are more results are available for download - please let me know if you need them.

zl-wang commented 2 years ago

@mnalam-p please take a look from JIT perspective

pshipton commented 2 years ago

This problem was reported on the head stream. Based on the grinder results, I'm setting as a blocker, but moving to the 0.30 milestone. ~@dmitripivkine can you restart your grinder with a 0.29 JVM to check if the problem occurs there. The latest open build is https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK17_ppc64le_linux_Nightly/44/OpenJ9-JDK17-ppc64le_linux-20211005-215708.tar.gz~

pshipton commented 2 years ago

can you restart your grinder with a 0.29 JVM to check if the problem occurs there

Actually, this is the vector incubator, which we're not supporting in 0.29 so it doesn't matter.

@gita-omr fyi this problem.

JasonFengJ9 commented 2 years ago

Another occurrence at JDK17 head stream at job/Test_openjdk17_j9_extended.openjdk_ppc64le_linux/21/consoleFull (ubu18le-rt1-4)

11:30:44  openjdk version "17" 2021-09-14
11:30:44  IBM Semeru Runtime Open Edition 17+35 (build 17+35)
11:30:44  Eclipse OpenJ9 VM 17+35 (build master-0fe704468, JRE 17 Linux ppc64le-64-Bit Compressed References 20211009_11 (JIT enabled, AOT enabled)
11:30:44  OpenJ9   - 0fe704468
11:30:44  OMR      - 8021d5725
11:30:44  JCL      - 92442f0e28 based on jdk-17+35)

16:56:49  STDERR:
16:56:49  WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
16:56:49  20:56:33.610 0x239600    j9mm.107    *   ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/MemoryPoolAddressOrderedList.cpp:633: ((false && (freeEntrySize >= _minimumFreeEntrySize)))
16:56:49  JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/10/09 13:56:33 - please wait.
JasonFengJ9 commented 2 years ago

Observed at job/Test_openjdk17_j9_extended.openjdk_aarch64_linux/30/ (cent7-aarch64-4)

openjdk version "17.0.1" 2021-10-19
IBM Semeru Runtime Certified Edition 17.0.1+12 (build 17.0.1+12)
Eclipse OpenJ9 VM 17.0.1+12 (build master-5afc033b2, JRE 17 Linux aarch64-64-Bit Compressed References 20211204_49 (JIT enabled, AOT enabled)
OpenJ9   - 5afc033b2
OMR      - 5e9418dcf
JCL      - a60fb4eda99 based on jdk-17.0.1+12)

[2021-12-04T19:17:39.026Z] variation: Mode650
[2021-12-04T19:17:39.026Z] JVM_OPTIONS:  -XX:-UseCompressedOops 

[2021-12-04T19:24:00.246Z] --------------------------------------------------
[2021-12-04T19:24:00.246Z] TEST: jdk/incubator/vector/Double64VectorLoadStoreTests.java

[2021-12-04T19:24:00.250Z] STDERR:
[2021-12-04T19:24:00.250Z] WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
[2021-12-04T19:24:00.250Z] 19:23:48.036 0xffff9c3f6100    j9mm.107    *   ** ASSERTION FAILED ** at ../../../../../../omr/gc/base/MemoryPoolAddressOrderedList.cpp:633: ((false && (freeEntrySize >= _minimumFreeEntrySize)))
[2021-12-04T19:24:00.250Z] JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/12/04 14:23:48 - please wait.

[2021-12-04T19:46:35.923Z] jdk_vector_1_FAILED
zl-wang commented 2 years ago

@IBMJimmyk please take a look

IBMJimmyk commented 2 years ago

Okay, I will take a look.