eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

JDK16 aarch64_linux : j9mm.479 * ** ASSERTION FAILED ** at openj9/runtime/gc_glue_java/MarkingSchemeRootMarker.cpp:53: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, object, stackLocation, walkState).validate(_env))) #13017

Closed JasonFengJ9 closed 3 years ago

JasonFengJ9 commented 3 years ago

Failure link

From an internal build Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/4/ (cent7-aarch64-2):

openjdk version "16.0.1" 2021-04-20
OpenJDK Runtime Environment Temurin-16.0.1+9 (build 16.0.1+9)
Eclipse OpenJ9 VM Temurin-16.0.1+9 (build master-0be9a6f3f, JRE 16 Linux aarch64-64-Bit Compressed References 20210619_49 (JIT enabled, AOT enabled)
OpenJ9   - 0be9a6f3f
OMR      - b7eb96336
JCL      - 831f892085 based on jdk-16.0.1+9)

Rerun in Grinder

Optional info

Failure output (captured from console output)

[2021-06-19T15:27:05.922Z] Running test LambdaLoadTest_CS_5m_0 ...
[2021-06-19T15:27:05.922Z] ===============================================
[2021-06-19T15:27:05.922Z] LambdaLoadTest_CS_5m_0 Start Time: Sat Jun 19 11:27:03 2021 Epoch Time (ms): 1624116423953
[2021-06-19T15:27:05.922Z] "/home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
[2021-06-19T15:27:05.922Z] JVMSHRC005I No shared class caches available
[2021-06-19T15:27:05.922Z] JVMSHRC005I No shared class caches available
[2021-06-19T15:27:05.922Z] cache cleanup done
[2021-06-19T15:27:05.922Z] variation: -Xgcpolicy:gencon -Xgc:concurrentScavenge
[2021-06-19T15:27:05.922Z] JVM_OPTIONS:  -Xgcpolicy:gencon -Xgc:concurrentScavenge 

[2021-06-19T15:32:07.700Z] STF 11:32:07.304 - Heartbeat: Process LT  is still running
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: Object neither in heap nor stack-allocated in thread load-9
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: O-Slot=0000FFFF6A0ADF88
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: O-Slot value=0000000000000004
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: PC=0000FFFF6AD30F01
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: framesWalked=3
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: arg0EA=0000000000B98D98
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: walkSP=0000000000B98CF8
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: literals=0000000000000010
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: jitInfo=0000FFFF69F16178
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: method=00000000007037E8 (net/adoptopenjdk/test/lambda/TestLambdasUserDefinedInterface$$Lambda$395/0x00000000d4031930.concatenate(Ljava/lang/String;)Ljava/lang/String;) (JIT)
[2021-06-19T15:32:10.155Z] LT  stderr 0000000000521700: stack=0000000000B02B18-0000000000B998E0
[2021-06-19T15:32:10.155Z] LT  stderr 15:32:07.936 0x168500    j9mm.479    *   ** ASSERTION FAILED ** at ../../../../../../openj9/runtime/gc_glue_java/MarkingSchemeRootMarker.cpp:53: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, object, stackLocation, walkState).validate(_env)))
[2021-06-19T15:32:10.155Z] LT  stderr JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/06/19 11:32:07 - please wait.
[2021-06-19T15:32:10.155Z] LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/core.20210619.113207.31306.0001.dmp' in response to an event
[2021-06-19T15:32:10.155Z] LT  stderr JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/core.20210619.113207.31306.0001.dmp
[2021-06-19T15:32:10.155Z] LT  stderr JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/javacore.20210619.113207.31306.0002.txt' in response to an event
[2021-06-19T15:32:10.155Z] STF 11:32:09.310 - Found dump at: /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/javacore.20210619.113207.31306.0002.txt
[2021-06-19T15:32:10.155Z] STF 11:32:09.311 - Found dump at: /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/core.20210619.113207.31306.0001.dmp
[2021-06-19T15:32:10.155Z] LT  stderr javacore file generated - /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/javacore.20210619.113207.31306.0002.txt
[2021-06-19T15:32:10.155Z] LT  stderr core file generated - /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/core.20210619.113207.31306.0001.dmp
[2021-06-19T15:32:12.439Z] LT  stderr JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/javacore.20210619.113207.31306.0002.txt
[2021-06-19T15:32:12.439Z] LT  stderr JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/Snap.20210619.113207.31306.0003.trc' in response to an event
[2021-06-19T15:32:12.439Z] LT  stderr JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/Snap.20210619.113207.31306.0003.trc
[2021-06-19T15:32:12.439Z] LT  stderr JVMDUMP013I Processed dump event "traceassert", detail "".
[2021-06-19T15:32:12.439Z] STF 11:32:11.318 - Found dump at: /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/Snap.20210619.113207.31306.0003.trc
[2021-06-19T15:32:12.439Z] STF 11:32:11.318 - **FAILED** Process LT  ended with exit code (255) and not the expected exit code/s (0)
[2021-06-19T15:32:12.439Z] LT  stderr Snap file generated - /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/results/Snap.20210619.113207.31306.0003.trc
[2021-06-19T15:32:12.439Z] STF 11:32:11.318 - Monitoring Report Summary:
[2021-06-19T15:32:12.439Z] STF 11:32:11.318 -   o Process LT  has crashed unexpectedly
[2021-06-19T15:32:12.439Z] STF 11:32:11.319 - Killing processes: LT 
[2021-06-19T15:32:12.439Z] STF 11:32:11.319 -   o Process LT  pid 31306 is not running
[2021-06-19T15:32:12.439Z] **FAILED** at step 1 (Run lambda and stream load test). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/../TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/execute.pl line 95.
[2021-06-19T15:32:12.439Z] STF 11:32:11.543 - **FAILED** execute script failed. Expected return value=0 Actual=1
[2021-06-19T15:32:12.439Z] STF 11:32:11.543 - 
[2021-06-19T15:32:12.439Z] STF 11:32:11.543 - ====================   T E A R D O W N   ====================
[2021-06-19T15:32:12.439Z] STF 11:32:11.543 - Running teardown: perl /home/jenkins/workspace/Test_openjdk16_j9_sanity.system_aarch64_linux_testList_0/aqa-tests/TKG/../TKG/output_162411048214/LambdaLoadTest_CS_5m_0/20210619-112704-LambdaLoadTest/tearDown.pl
[2021-06-19T15:32:12.439Z] STF 11:32:11.631 - TEARDOWN stage completed
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 - 
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 - =====================   R E S U L T S   =====================
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 - Stage results:
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 -   setUp:     pass
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 -   execute:  *fail*
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 -   teardown:  pass
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 - 
[2021-06-19T15:32:12.439Z] STF 11:32:11.638 - Overall result: **FAILED**
[2021-06-19T15:32:12.439Z] 
[2021-06-19T15:32:12.439Z] LambdaLoadTest_CS_5m_0_FAILED

fyi @knn-k

Akira1Saitoh commented 3 years ago

Ran 10x Grinder on internal Jenkins job/Grinder/16464/, but it did not fail.

Akira1Saitoh commented 3 years ago

I managed to produce it with 20x Grinder on internal Jenkins (job/Grinder/16465/).

Akira1Saitoh commented 3 years ago

This issue is also reproducible with the following version of OpenJ9 (see job/Grinder/16468):

[2021-06-22T05:34:52.114Z] openjdk version "16.0.1-beta" 2021-04-20 [2021-06-22T05:34:52.114Z] OpenJDK Runtime Environment Temurin-16.0.1+9-202106150403 (build 16.0.1-beta+9-202106150403) [2021-06-22T05:34:52.114Z] Eclipse OpenJ9 VM Temurin-16.0.1+9-202106150403 (build master-6ebada739, JRE 16 Linux aarch64-64-Bit Compressed References 20210615_47 (JIT enabled, AOT enabled) [2021-06-22T05:34:52.114Z] OpenJ9 - 6ebada739 [2021-06-22T05:34:52.114Z] OMR - 4a8c53cf5 [2021-06-22T05:34:52.114Z] JCL - 4af707cd87 based on jdk-16.0.1+9)

Akira1Saitoh commented 3 years ago

Looks like a GC map is added to incorrect instruction for implicit null check. addimmx GPR_0032, [&GPR_0016, 4] is chosen as an instruction throwing implicit NPE, but it should be the next instruction ldrimmw &GPR_0035, [GPR_0032, 0].

 n749n    (  0)  NULLCHK on n1089n [#32]                                                              [    0xfffe6502d9d0] bci=[0,0,589] rc=0 vc=895 vn=- li=53 udi=- nc=1
 n417n    (  3)    l2a (in &GPR_0035) (X!=0 sharedMemory )                                            [    0xfffe65027210] bci=[2,18,2338] rc=3 vc=895 vn=- li=53 udi=12688 nc=1 flg=0x4
 n1089n   (  0)      iu2l (in &GPR_0035) (X!=0 )                                                      [    0xfffe65154420] bci=[2,18,2338] rc=0 vc=895 vn=- li=- udi=12688 nc=1 flg=0x4
 n1088n   (  0)        irdbari  java/lang/String.value [B[#370  final Shadow +4] [flags 0x400a0607 0x0 ] (in &GPR_0035) (X!=0 )  [    0xfffe651543d0] bci=[2,18,2338] rc=0 vc=895 vn=- li=- udi=12688 nc=1 flg=0x4
 n891n    (  0)          ==>aRegLoad (in &GPR_0016) (X!=0 X>=0 SeenRealReference sharedMemory )
------------------------------

 [    0xfffe651033f0]   18      addimmx         GPR_0032, [&GPR_0016, 4]                # SymRef  java/lang/String.value [B[#370  final Shadow +4] [flags 0x400a0607 0x0 ]       ; Throws Implicit Null Pointer Exception
 [    0xfffe65103510]   18      ldrimmw         &GPR_0035, [GPR_0032, 0]        
 [

After register assignment, GPR_0032 and GPR_0016 happens to be assigned the same register (x1), and GPR_0016 is a collected reference register, x1 is incorrectly added to the GC map.

 \\ net/adoptopenjdk/test/lambda/TestLambdasUserDefinedInterface$$Lambda$321/0x00000000e402ef98.concatenate(Ljava/lang/String;)Ljava/lang/String;
 \\    1 JBinvokestatic 4 net/adoptopenjdk/test/lambda/TestLambdasUserDefinedInterface.lambda$testLambdaExceptionHandling$38(Ljava/lang/String;)Ljava/lang/String;
 \\       4 JBinvokevirtual 144 java/lang/String.concat(Ljava/lang/String;)Ljava/lang/String;
 \\          5 JBinvokevirtual 24 java/lang/String.lengthInternal()I
 \\            18 JBgetfield 19 java/lang/String.value [B

    0xffff5c99dd1c 00000038 [    0xfffe651033f0] 91001021 18    addimmx         w1, [w1, 4]             # SymRef  java/lang/String.value [B[#370  final Shadow +4] [flags 0x400a0607 0x0 ]       ; Throws Implicit Null Pointer Exception
    0xffff5c99dd20 0000003c [    0xfffe65103510] b9400022 18    ldrimmw         w2, [w1, 0]     
Akira1Saitoh commented 3 years ago

This issue will be fixed by https://github.com/eclipse-openj9/openj9/pull/13047

knn-k commented 3 years ago

I merged #13047.

JasonFengJ9 commented 3 years ago

Same assertion error occurred hence re-opening.

From an internal build Test_openjdk16_j9_sanity.openjdk_aarch64_linux/50/ (cent7-aarch64-4)

openjdk version "16.0.2-ea" 2021-07-20
IBM Semeru Runtime Open Edition 16.0.2.0 (build 16.0.2-ea+7)
Eclipse OpenJ9 VM 16.0.2.0 (build master-9af2e37a0, JRE 16 Linux aarch64-64-Bit Compressed References 20210805_70 (JIT enabled, AOT enabled)
OpenJ9   - 9af2e37a0
OMR      - 45440608c
JCL      - ee030f1f04 based on jdk-16.0.2+7)

Rerun in Grinder

[2021-08-05T04:55:59.781Z] Running test jdk_lang_1 ...
[2021-08-05T04:55:59.781Z] ===============================================
[2021-08-05T04:55:59.781Z] jdk_lang_1 Start Time: Thu Aug  5 00:55:57 2021 Epoch Time (ms): 1628139357015
[2021-08-05T04:55:59.781Z] "/home/jenkins/workspace/Test_openjdk16_j9_sanity.openjdk_aarch64_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk16_j9_sanity.openjdk_aarch64_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
[2021-08-05T04:55:59.781Z] JVMSHRC005I No shared class caches available
[2021-08-05T04:55:59.781Z] JVMSHRC005I No shared class caches available
[2021-08-05T04:55:59.781Z] cache cleanup done
[2021-08-05T04:55:59.781Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650
[2021-08-05T04:55:59.781Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops 

[2021-08-05T05:14:47.690Z] --------------------------------------------------
[2021-08-05T05:14:47.690Z] TEST: java/lang/ref/EnqueuePollRace.java

[2021-08-05T05:14:47.691Z] STDERR:
[2021-08-05T05:14:47.691Z] 00000000001A8800: Object neither in heap nor stack-allocated in thread Finalizer thread
[2021-08-05T05:14:47.691Z] 00000000001A8800:    O-Slot=00000000001A8590
[2021-08-05T05:14:47.691Z] 00000000001A8800:    O-Slot value=000000000000001E
[2021-08-05T05:14:47.691Z] 00000000001A8800:    PC=0000FFFF8037CEFD
[2021-08-05T05:14:47.691Z] 00000000001A8800:    framesWalked=4
[2021-08-05T05:14:47.691Z] 00000000001A8800:    arg0EA=00000000001A8598
[2021-08-05T05:14:47.691Z] 00000000001A8800:    walkSP=00000000001A8488
[2021-08-05T05:14:47.691Z] 00000000001A8800:    literals=0000000000000010
[2021-08-05T05:14:47.691Z] 00000000001A8800:    jitInfo=0000FFFF7BE77E38
[2021-08-05T05:14:47.691Z] 00000000001A8800:    method=00000000000A0000 (java/util/concurrent/ConcurrentHashMap.get(Ljava/lang/Object;)Ljava/lang/Object;) (JIT)
[2021-08-05T05:14:47.691Z] 00000000001A8800:    stack=00000000001A6F90-00000000001A8770
[2021-08-05T05:14:47.691Z] 05:14:32.732 0x148600    j9mm.479    *   ** ASSERTION FAILED ** at ../../../../../../openj9/runtime/gc_glue_java/MarkingSchemeRootMarker.cpp:53: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, object, stackLocation, walkState).validate(_env)))
[2021-08-05T05:14:47.691Z] JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/08/05 01:14:32 - please wait.

[2021-08-05T05:14:47.691Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 255]
[2021-08-05T05:14:47.691Z] --------------------------------------------------
[2021-08-05T05:18:12.786Z] Test results: passed: 748; failed: 1
[2021-08-05T05:18:33.059Z] Report written to /home/jenkins/workspace/Test_openjdk16_j9_sanity.openjdk_aarch64_linux/jvmtest/openjdk/report/html/report.html
[2021-08-05T05:18:33.059Z] Results written to /home/jenkins/workspace/Test_openjdk16_j9_sanity.openjdk_aarch64_linux/aqa-tests/TKG/output_16281379951769/jdk_lang_1/work
[2021-08-05T05:18:33.059Z] Error: Some tests failed or other problems occurred.
[2021-08-05T05:18:33.059Z] 
[2021-08-05T05:18:33.059Z] jdk_lang_1_FAILED

fyi @knn-k

JasonFengJ9 commented 3 years ago

JDK16 Windows build has same assertion error at Test_openjdk16_j9_special.system_x86-64_windows_Personal/6/tapResults/

openjdk version "16.0.2-internal" 2021-07-20
OpenJDK Runtime Environment (build 16.0.2-internal+0-adhoc.jenkins.buildjdk16x86-64windowspersonal)
Eclipse OpenJ9 VM (build master-2ba48511f18, JRE 16 Windows Server 2016 amd64-64-Bit Compressed References 20210822_24 (JIT enabled, AOT enabled)
OpenJ9   - 2ba48511f18
OMR      - 6eec759cd2d
JCL      - 44316ee515b based on jdk-16.0.2+7)

===============================================
Running test LambdaLoadTest_special_J9_5m_1 ...
===============================================
LambdaLoadTest_special_J9_5m_1 Start Time: Sun Aug 22 22:36:32 2021 Epoch Time (ms): 1629696992224
variation: Mode103
JVM_OPTIONS:  -Xgcpolicy:optthruput -Xjit:count=0 -Xnocompressedrefs 

LT  22:37:24.802 - Test failed. Details recorded in execution log.
LT  stderr 0000021794419D00: Object neither in heap nor stack-allocated in thread load-5
LT  stderr 0000021794419D00:    O-Slot=000000274B41F2B8
LT  stderr 0000021794419D00:    O-Slot value=0074007300650074
LT  stderr 0000021794419D00:    PC=00007FFD6F6E1C32
LT  stderr 0000021794419D00:    framesWalked=4
LT  stderr 0000021794419D00:    arg0EA=0000021794A39258
LT  stderr 0000021794419D00:    walkSP=0000021794A39138
LT  stderr 0000021794419D00:    literals=0000000000000010
LT  stderr 0000021794419D00:    jitInfo=0000021795A7EC68
LT  stderr 0000021794419D00:    method=0000021794500420 (net/adoptopenjdk/loadTest/adaptors/JUnitAdaptor$1.addResult(Ljava/lang/String;Ljava/lang/String;Lorg/junit/runner/Description;Ljava/lang/String;Ljava/lang/Throwable;)V) (JIT)
LT  stderr 0000021794419D00:    stack=0000021794A328A0-0000021794A39930
LT  stderr 05:37:24.846 0x21793de9400    j9mm.479    *   ** ASSERTION FAILED ** at c:\users\jenkins\workspace\build_jdk16_x86-64_windows_personal\openj9\runtime\gc_glue_java\MarkingSchemeRootMarker.cpp:53: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, object, stackLocation, walkState).validate(_env)))
LT  stderr JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/08/22 22:37:24 - please wait.
LT  stderr JVMDUMP032I JVM requested System dump using 'C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\core.20210822.223724.5288.0001.dmp' in response to an event
STF 22:37:25.228 - Found dump at: C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\core.20210822.223724.5288.0001.dmp
LT  stderr core file generated - C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\core.20210822.223724.5288.0001.dmp
LT  stderr 88.0001.dmp
LT  stderr JVMDUMP032I JVM requested Java dump using 'C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\javacore.20210822.223724.5288.0002.txt' in response to an event
STF 22:37:32.218 - Found dump at: C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\javacore.20210822.223724.5288.0002.txt
LT  stderr javacore file generated - C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\javacore.20210822.223724.5288.0002.txt
LT  stderr 2.txt
LT  stderr JVMDUMP032I JVM requested Snap dump using 'C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\Snap.20210822.223724.5288.0003.trc' in response to an event
LT  stderr JVMDUMP010I Snap dump written to C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\Snap.20210822.223724.5288.0003.trc
LT  stderr JVMDUMP013I Processed dump event "traceassert", detail "".
STF 22:37:35.342 - Found dump at: C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\Snap.20210822.223724.5288.0003.trc
LT  stderr Snap file generated - C:\Users\jenkins\workspace\Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2\aqa-tests\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1\20210822-223635-LambdaLoadTest\results\Snap.20210822.223724.5288.0003.trc
STF 22:37:35.907 - **FAILED** Process LT  ended with exit code (4294967295) and not the expected exit code/s (0)
STF 22:37:35.908 - Monitoring Report Summary:
STF 22:37:35.908 -   o Process LT  has crashed unexpectedly
STF 22:37:35.909 - Killing processes: LT 
STF 22:37:35.909 -   o Process LT  pid 5288 is not running
**FAILED** at step 1 (Run lambda and stream load test). Expected return value=0 Actual=1 at C:/Users/jenkins/workspace/Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2/aqa-tests/\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1/20210822-223635-LambdaLoadTest/execute.pl line 97.
STF 22:37:36.435 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 22:37:36.435 - 
STF 22:37:36.435 - ====================   T E A R D O W N   ====================
STF 22:37:36.435 - Running teardown: perl C:/Users/jenkins/workspace/Test_openjdk16_j9_special.system_x86-64_windows_Personal_testList_2/aqa-tests/\TKG\output_16296893109699\LambdaLoadTest_special_J9_5m_1/20210822-223635-LambdaLoadTest/tearDown.pl
STF 22:37:36.752 - TEARDOWN stage completed
STF 22:37:36.782 - 
STF 22:37:36.782 - =====================   R E S U L T S   =====================
STF 22:37:36.782 - Stage results:
STF 22:37:36.782 -   setUp:     pass
STF 22:37:36.782 -   execute:  *fail*
STF 22:37:36.782 -   teardown:  pass
STF 22:37:36.782 - 
STF 22:37:36.782 - Overall result: **FAILED**

LambdaLoadTest_special_J9_5m_1_FAILED

Re-run the Grinder

Edit: assigning this to 0.28 release since JDK16 head stream w/ ojdk_mh is used for JDK17 test.

dmitripivkine commented 3 years ago

bogus o-slot value O-Slot value=0074007300650074 is a part of char string obviously. I guess it is possible this is a stall value because of GC happen on background and some object has been reallocated. Please let me know if you need my help with this investigation (technically this is just reporting of the problem in GC) @0xdaryl FYI

0xdaryl commented 3 years ago

@knn-k @Akira1Saitoh : would one of you mind picking up the investigation on this please?

Akira1Saitoh commented 3 years ago

I will look into this.

Akira1Saitoh commented 3 years ago

In the failure described in https://github.com/eclipse-openj9/openj9/issues/13017#issuecomment-894511954, the stacktrace of finalizer thread is odd.

3XMTHREADINFO      "Finalizer thread" J9VMThread:0x00000000001A8800, omrthread_t:0x0000FFFF18000A20, java/lang/Thread:0x00000000FFE9BBD0, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1A, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0x110D, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2            (native stack address range from:0x0000FFFF7B7E0000, to:0x0000FFFF7B820000, size:0x40000)
3XMCPUTIME               CPU usage total: 0.005432805 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/concurrent/ConcurrentHashMap.get(ConcurrentHashMap.java:936(Compiled Code))
4XESTACKTRACE                at java/lang/ClassLoader$ClassNameLockRef.run(ClassLoader.java:139(Compiled Code))
5XESTACKTRACE                   (entered lock: java/lang/Class$ReflectCache@0x00000000FFE03BA8, entry count: 1)
4XESTACKTRACE                at java/lang/ref/ReferenceQueue.enqueue(ReferenceQueue.java:176(Compiled Code))
4XESTACKTRACE                at java/lang/ref/Reference.enqueueImpl(Reference.java:160(Compiled Code))
5XESTACKTRACE                   (entered lock: java/lang/Class$ReflectRef@0x00000000FFE854E8, entry count: 1)

At java/lang/ref/Reference.enqueueImpl, the lock of the reference is taken and it says the class of the reference is java/lang/Class$ReflectRef.

However, java/lang/ClassLoader$ClassNameLockRef.run is called from java/lang/ref/ReferenceQueue.enqueue, which happens when reference is the instance of java/lang/ClassLoader$ClassNameLockRef. https://github.com/eclipse-openj9/openj9/blob/f29e6446a54643666a96b01ad54094fd619d077f/jcl/src/java.base/share/classes/java/lang/ref/ReferenceQueue.java#L183-L190

In the jitted code of java/lang/ref/ReferenceQueue.enqueue, it tests if the class of the reference is java/lang/Class$ReflectRef(0x00000000000D0000 shared) and if not, it branches to 0xffff803daa18. But, it looks like the instructions from 0xffff803daa18 is the inlined code for java/lang/Class$ReflectRef.run. So, b.ne at 0xffff803d9ee0 looks wrong. As I don't have a trace file, I am not sure why the wrong guard is generated.

When the problem occurred, the branch to 0xffff803daa5c at 0xffff803d9f20 was taken. The instructions from 0xffff803daa5c looks like the inlined version of java/lang/ClassLoader$ClassNameLockRef.run.

   0xffff803d9ed0:      str     x1, [x20, #264]
   0xffff803d9ed4:      ldr     w2, [x1]
   0xffff803d9ed8:      and     w2, w2, #0xffffff00
   0xffff803d9edc:      cmp     x2, #0xd0, lsl #12 // <- java/lang/Class$ReflectRef(0x00000000000D0000 shared)
   0xffff803d9ee0:      b.ne    0xffff803daa18  // b.any
   0xffff803d9ee4:      nop
   0xffff803d9ee8:      ldr     w2, [x1, #28]
   0xffff803d9eec:      str     x2, [x20, #168]
   0xffff803d9ef0:      cbz     x2, 0xffff803dad50
   0xffff803d9ef4:      mov     x0, x2
   0xffff803d9ef8:      bl      0xffff8056fbb0
   0xffff803d9efc:      str     x2, [x20, #144]
   0xffff803d9f00:      ldr     w0, [x1, #28]  // java/lang/ClassLoader$ClassNameLockRef::classNameLockHT
   0xffff803d9f04:      ldr     w1, [x1, #24]  // java/lang/ClassLoader$ClassNameLockRef::key
   0xffff803d9f08:      ldr     w3, [x0]
   0xffff803d9f0c:      and     w3, w3, #0xffffff00
   0xffff803d9f10:      str     x0, [x20, #240]
   0xffff803d9f14:      str     x1, [x20, #248]
   0xffff803d9f18:      ldr     x2, 0xffff803db258  // <-- java/util/Hashtable(0x0000000000092C00 shared)
   0xffff803d9f1c:      cmp     x3, x2
   0xffff803d9f20:      b.ne    0xffff803daa5c  // b.any        <---- this branch was taken 
...
   0xffff803daa18:      ldr     x1, [x20, #304]
   0xffff803daa1c:      mov     x0, x1
   0xffff803daa20:      ldr     w21, [x1]
   0xffff803daa24:      and     w21, w21, #0xffffff00
   0xffff803daa28:      b       0xffff803db0c4
   0xffff803daa2c:      mov     w0, #0x1                        // #1
   0xffff803daa30:      ldr     x21, [x20, #40]
   0xffff803daa34:      ldr     x22, [x20, #48]
   0xffff803daa38:      ldr     x23, [x20, #56]
   0xffff803daa3c:      ldr     x24, [x20, #64]
   0xffff803daa40:      ldr     x25, [x20, #72]
   0xffff803daa44:      ldr     x26, [x20, #80]
   0xffff803daa48:      ldr     x27, [x20, #88]
   0xffff803daa4c:      ldr     x28, [x20, #96]
   0xffff803daa50:      add     x20, x20, #0x130
   0xffff803daa54:      ldur    x30, [x20, #-8]
   0xffff803daa58:      ret
   0xffff803daa5c:      ldr     x21, [x20, #264]
   0xffff803daa60:      ldr     w2, [x21, #28]
   0xffff803daa64:      ldr     w1, [x21, #24]
   0xffff803daa68:      mov     x0, x2
   0xffff803daa6c:      ldr     w22, [x2]
   0xffff803daa70:      and     w22, w22, #0xffffff00
   0xffff803daa74:      mov     x9, #0xffa0                     // #65440
   0xffff803daa78:      movk    x9, #0xffff, lsl #16
   0xffff803daa7c:      sxtw    x9, w9
   0xffff803daa80:      ldr     x9, [x22, x9]
   0xffff803daa84:      blr     x9                       <--- call to ConcurrentHashMap::get
   0xffff803daa88:      cmp     x0, x21
   0xffff803daa8c:      b.ne    0xffff803da1b4  // b.any
   0xffff803daa90:      ldr     w21, [x21, #28]
   0xffff803daa94:      cbz     x21, 0xffff803db12c
   0xffff803daa98:      b       0xffff803da254
   0xffff803daa9c:      ldr     x1, [x19, #64]
   0xffff803daaa0:      ldr     x0, [x20, #280]
   0xffff803daaa4:      add     x3, x0, #0x4
   0xffff803daaa8:      ldr     w2, [x3]
   0xffff803daaac:      cmp     x2, x19
   0xffff803daab0:      b.ne    0xffff803db130  // b.any
   0xffff803daab4:      stlr    wzr, [x3]
   0xffff803daab8:      mov     x0, x1
   0xffff803daabc:      bl      0xffff8056f630
...
Akira1Saitoh commented 3 years ago

I am not sure if the crash in windows (https://github.com/eclipse-openj9/openj9/issues/13017#issuecomment-903706842) is the same problem as aarch64 ones.

knn-k commented 3 years ago

@JasonFengJ9 I think the Windows JDK 16 case should be handled in another issue.

Akira1Saitoh commented 3 years ago

Launching 20x internal Grinder job for jdk_lang_1 with personal build of JDK16 from the recent source tree. job/Grinder/17647/

[2021-08-24T08:17:52.819Z] Eclipse OpenJ9 VM (build master-2ba48511f18, JRE 16 Linux aarch64-64-Bit Compressed References 20210822_11 (JIT enabled, AOT enabled)
[2021-08-24T08:17:52.819Z] OpenJ9   - 2ba48511f18
[2021-08-24T08:17:52.819Z] OMR      - 6eec759cd2d
[2021-08-24T08:17:52.819Z] JCL      - 44316ee515b based on jdk-16.0.2+7)
JasonFengJ9 commented 3 years ago

I think the Windows JDK 16 case should be handled in another issue.

Created https://github.com/eclipse-openj9/openj9/issues/13369

Akira1Saitoh commented 3 years ago

I could not reproduce the aarch64 crash with 20x Grinder. However, I think that relocation record for class constant is missing in the code I pasted in https://github.com/eclipse-openj9/openj9/issues/13017#issuecomment-904351614. Other class constants are loaded with 4 instructions, which means it is AOT compiled code, but java/lang/Class$ReflectRef is not. cmp instruction is generated in https://github.com/eclipse/omr/blob/ab70342615ad03459e8a2513615139503294d87d/compiler/aarch64/codegen/ControlFlowEvaluator.cpp#L185, we need to handle AOT compilation case here. It happens only when the constant can be encoded into 12bit immediate field.

   0xffff803d9ed4:      ldr     w2, [x1]
   0xffff803d9ed8:      and     w2, w2, #0xffffff00
   0xffff803d9edc:      cmp     x2, #0xd0, lsl #12 // <- java/lang/Class$ReflectRef(0x00000000000D0000 shared)
   0xffff803d9ee0:      b.ne    0xffff803daa18  // b.any
0xdaryl commented 3 years ago

With the various fixes mentioned above applied the original problem reported in this issue has not been seen since, despite being reproducible before. Closing.