Closed hzongaro closed 1 year ago
possibly duplicated by https://github.com/eclipse-openj9/openj9/issues/14717 (which has more discussion)
By limiting the compiled method to FlattenedLine2D.makeValueGeneric(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
, I can reproduce a similar intermittent crash in GC . FlattenedLine2D.makeValueGeneric
is compiled at warm and has one inlined method:
CalleeIndex CallerIndex ByteCodeIndex CalleeMethod
0 -1 8 FlattenedLine2D.makeValue(QPoint2D;QPoint2D;)QFlattenedLine2D;
In my reproduced case, the test crashed in GC MM_StackSlotValidator::validate [1] because the slot value (0x85a1c238
) points to the middle of a java/lang/Integer
object (0x85a1c230
)[2]. 0x85a1c230
has value 0xCCDDCCDD
. It is likely an array element in defaultTrianglePositions.
The crash happened when testValueWithSingleAlignmentGCScanning was running [3]. I looked at the compiled FlattenedLine2D.makeValueGeneric
but so far I haven't found anything abnormal yet. The investigation is ongoing.
[1]
----------- Stack Backtrace -----------
_ZN26MM_MarkingSchemeRootMarker11doStackSlotEPP8J9ObjectPvPKv+0x314 (0x0000FFFFB063A6F4 [libj9gc29.so+0x18b6f4])
jitWalkFrame+0x158 (0x0000FFFFB0F25D7C [libj9jit29.so+0x7c2d7c])
jitWalkStackFrames+0xb90 (0x0000FFFFB0F26EC4 [libj9jit29.so+0x7c3ec4])
walkStackFrames+0xc0 (0x0000FFFFB1C63890 [libj9vm29.so+0x74890])
_ZN28GC_VMThreadStackSlotIterator9scanSlotsEP10J9VMThreadS1_PvPFvP8J9JavaVMPP8J9ObjectS2_P16J9StackWalkStatePKvEbb+0x44 (0x0000FFFFB04F3994 [libj9gc29.so+0x44994])
_ZN14MM_RootScanner13scanOneThreadEP18MM_EnvironmentBaseP10J9VMThreadPv+0xec (0x0000FFFFB04EAC8C [libj9gc29.so+0x3bc8c])
_ZN14MM_RootScanner11scanThreadsEP18MM_EnvironmentBase+0xc0 (0x0000FFFFB04EA170 [libj9gc29.so+0x3b170])
_ZN14MM_RootScanner9scanRootsEP18MM_EnvironmentBase+0x54 (0x0000FFFFB04ED0C8 [libj9gc29.so+0x3e0c8])
_ZN18MM_MarkingDelegate9scanRootsEP18MM_EnvironmentBaseb+0xe4 (0x0000FFFFB06352D4 [libj9gc29.so+0x1862d4])
_ZN19MM_ParallelMarkTask3runEP18MM_EnvironmentBase+0x84 (0x0000FFFFB065B1D8 [libj9gc29.so+0x1ac1d8])
_ZN21MM_ParallelDispatcher16workerEntryPointEP18MM_EnvironmentBase+0x218 (0x0000FFFFB05CFFC8 [libj9gc29.so+0x120fc8])
_Z23dispatcher_thread_proc2P14OMRPortLibraryPv+0x11c (0x0000FFFFB05CF7A0 [libj9gc29.so+0x1207a0])
omrsig_protect+0x21c (0x0000FFFFB1B8974C [libj9prt29.so+0x2874c])
dispatcher_thread_proc+0x38 (0x0000FFFFB05CEF08 [libj9gc29.so+0x11ff08])
thread_wrapper+0xcc (0x0000FFFFB1B373BC [libj9thr29.so+0x73bc])
start_thread+0xb0 (0x0000FFFFB2361088 [libpthread.so.0+0x7088])
---------------------------------------
...
0000000000011200: Unhandled exception while validating object in stack frame in thread main
0000000000011200: O-Slot=0000FFFFB21B91D0
0000000000011200: O-Slot value=0000000085A1C238
0000000000011200: PC=0000FFFF90A02268
0000000000011200: framesWalked=1
0000000000011200: arg0EA=0000000000106F48
0000000000011200: walkSP=0000000000106EC8
0000000000011200: literals=0000000000000000
0000000000011200: jitInfo=0000FFFF8A1DA038
0000000000011200: method=00000000006075D0 (FlattenedLine2D.makeValueGeneric(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;) (JIT)
0000000000011200: stack=0000000000101A80-0000000000107990
-----------------------------------
ValueTypeTestsJIT_1_FAILED
-----------------------------------
#12 <signal handler called>
#13 0x0000ffffb063a6f4 in MM_StackSlotValidator::validate (env=0xffff38001cc8, this=0xffff8a7d4ef0)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/gc_base/StackSlotValidator.hpp:122
#14 MM_MarkingSchemeRootMarker::doStackSlot (this=0xffff8a7d5530, slotPtr=<optimized out>, walkState=0xffff8a7d5118,
stackLocation=0xffffb21b91d0)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/gc_glue_java/MarkingSchemeRootMarker.cpp:48
#15 0x0000ffffb0f25d7c in jitWalkRegisterMap (gcStackAtlas=<optimized out>, stackMap=<optimized out>, walkState=0xffff8a7d5118)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/codert_vm/jswalk.c:733
#16 jitWalkFrame (walkState=walkState@entry=0xffff8a7d5118, walkLocals=walkLocals@entry=1, stackMap=<optimized out>)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/codert_vm/jswalk.c:582
#17 0x0000ffffb0f26ec4 in jitWalkStackFrames (walkState=0xffff8a7d5118)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/codert_vm/jswalk.c:243
#18 0x0000ffffb1c63890 in walkStackFrames (currentThread=0x1a4800, walkState=0xffff8a7d5118)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/vm/swalk.c:384
--Type <RET> for more, q to quit, c to continue without paging--
#19 0x0000ffffb04f3994 in GC_VMThreadStackSlotIterator::scanSlots (vmThread=vmThread@entry=0x1a4800,
walkThread=walkThread@entry=0x11200, userData=userData@entry=0xffff8a7d54d0, oSlotIterator=<optimized out>,
includeStackFrameClassReferences=<optimized out>, trackVisibleFrameDepth=<optimized out>)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/gc_structs/VMThreadStackSlotIterator.cpp:129
....
[2]
(gdb) fr 13
#13 0x0000ffffb063a6f4 in MM_StackSlotValidator::validate (env=0xffff38001cc8, this=0xffff8a7d4ef0)
at /home/jenkins/workspace/Build_JDKnext_aarch64_linux_valhalla_Personal/openj9/runtime/gc_base/StackSlotValidator.hpp:122
warning: Source file is more recent than executable.
122 } else if (!couldBeForwarded && ((UDATA)0x99669966 != J9GC_J9OBJECT_CLAZZ(_slotValue, env)->eyecatcher)) {
(gdb) p _slotValue
$1 = (J9Object * const) 0x85a1c238
> whatis 0x85a1c238
heap #1 - name: Flat@ffffac098280
0x85a1c238 is within heap segment: 83690000 -- 86200000
0x85a1c238 is within an object on the heap:
offset 8 within java/lang/Integer instance @ 0x85a1c230
> !j9object 0x85a1c230
!J9Object 0x0000000085A1C230 {
struct J9Class* clazz = !j9class 0x71600 // java/lang/Integer
Object flags = 0x00000000;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
I value = 0xCCDDCCDD (offset = 4) (java/lang/Integer)
}
[3]
!stack 0x00011200
<11200> known but unhandled frame type com.ibm.j9ddr.vm29.pointer.U8Pointer @ 0x00000005
FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT FAULT
<11200> !j9method 0x00000000006075D0 FlattenedLine2D.makeValueGeneric(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000006322D8 java/lang/invoke/LambdaForm$DMH/0x00000000ac634dc0.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000005C6DB8 java/lang/invoke/LambdaForm$MH/0x00000000ac5c8e70.invoke_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x000000000029F8B8 org/openj9/test/lworld/ValueTypeTests.createTriangle2D([[[I)Ljava/lang/Object;
<11200> !j9method 0x000000000029F938 org/openj9/test/lworld/ValueTypeTests.createAssorted(Ljava/lang/invoke/MethodHandle;[Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x000000000029F918 org/openj9/test/lworld/ValueTypeTests.createAssorted(Ljava/lang/invoke/MethodHandle;[Ljava/lang/String;)Ljava/lang/Object;
<11200> !j9method 0x000000000029F338 org/openj9/test/lworld/ValueTypeTests.testValueWithSingleAlignmentGCScanning()V
<11200> !j9method 0x00000000003DD0A8 java/lang/invoke/LambdaForm$DMH/0x00000000ac3d3270.invokeStatic(Ljava/lang/Object;)V
<11200> !j9method 0x00000000003DDF98 java/lang/invoke/LambdaForm$MH/0x00000000ac3d5ac0.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000001B0BB8 java/lang/invoke/LambdaForm$MH/0x00000000ac2310c0.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000001AD998 jdk/internal/reflect/DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000001AD958 jdk/internal/reflect/DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x0000000000059F80 java/lang/reflect/Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x00000000003D1C48 org/testng/internal/MethodInvocationHelper.invokeMethod(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
<11200> !j9method 0x0000000000385150 org/testng/internal/Invoker.invokeMethod(Ljava/lang/Object;Lorg/testng/ITestNGMethod;[Ljava/lang/Object;ILorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/ITestClass;[Lorg/testng/ITestNGMethod;[Lorg/testng/ITestNGMethod;Lorg/testng/internal/ConfigurationGroupMethods;Lorg/testng/internal/Invoker$FailureContext;)Lorg/testng/ITestResult;
<11200> !j9method 0x00000000003851B0 org/testng/internal/Invoker.invokeTestMethod(Ljava/lang/Object;Lorg/testng/ITestNGMethod;[Ljava/lang/Object;ILorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/ITestClass;[Lorg/testng/ITestNGMethod;[Lorg/testng/ITestNGMethod;Lorg/testng/internal/ConfigurationGroupMethods;Lorg/testng/internal/Invoker$FailureContext;)Lorg/testng/ITestResult;
<11200> !j9method 0x0000000000385230 org/testng/internal/Invoker.invokeTestMethods(Lorg/testng/ITestNGMethod;Lorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/internal/ConfigurationGroupMethods;Ljava/lang/Object;Lorg/testng/ITestContext;)Ljava/util/List;
<11200> !j9method 0x00000000003C93A0 org/testng/internal/TestMethodWorker.invokeTestMethods(Lorg/testng/ITestNGMethod;Ljava/lang/Object;Lorg/testng/ITestContext;)V
<11200> !j9method 0x00000000003C9380 org/testng/internal/TestMethodWorker.run()V
<11200> !j9method 0x0000000000374908 org/testng/TestRunner.privateRun(Lorg/testng/xml/XmlTest;)V
<11200> !j9method 0x00000000003748A8 org/testng/TestRunner.run()V
<11200> !j9method 0x000000000036F5B0 org/testng/SuiteRunner.runTest(Lorg/testng/TestRunner;)V
<11200> !j9method 0x000000000036F590 org/testng/SuiteRunner.runSequentially()V
<11200> !j9method 0x000000000036F510 org/testng/SuiteRunner.privateRun()V
<11200> !j9method 0x000000000036F4F0 org/testng/SuiteRunner.run()V
<11200> !j9method 0x00000000003B8818 org/testng/SuiteRunnerWorker.runSuite(Lorg/testng/internal/SuiteRunnerMap;Lorg/testng/xml/XmlSuite;)V
<11200> !j9method 0x00000000003B8838 org/testng/SuiteRunnerWorker.run()V
<11200> !j9method 0x00000000001EEEE8 org/testng/TestNG.runSuitesSequentially(Lorg/testng/xml/XmlSuite;Lorg/testng/internal/SuiteRunnerMap;ILjava/lang/String;)V
<11200> !j9method 0x00000000001EEE88 org/testng/TestNG.runSuitesLocally()Ljava/util/List;
<11200> !j9method 0x00000000001EEDA8 org/testng/TestNG.runSuites()Ljava/util/List;
<11200> !j9method 0x00000000001EED88 org/testng/TestNG.run()V
<11200> !j9method 0x00000000001EEFA8 org/testng/TestNG.privateMain([Ljava/lang/String;Lorg/testng/ITestListener;)Lorg/testng/TestNG;
<11200> !j9method 0x00000000001EEF88 org/testng/TestNG.main([Ljava/lang/String;)V
<11200> JNI call-in frame
<11200> Native method frame
I'm able to reproduce *** Invalid JIT return address 0000000000000000 in 0000FFFF96B323D8
by compiling FlattenedLine2D.makeValueGeneric
at cold. It shows currentThread->jitReturnAddress
is NULL
[1]. It happened when jitNewValue
is invoked.
I looked at arm64nathelp.m4 and noticed that jitNewValue
is declared as NEW_DUAL_MODE_HELPER
. x, p, and z declare it as DUAL_MODE_ALLOCATION_HELPER
or DUAL_MODE_HELPER
. And for x, p, and z, jitNewValue
and jitNewObject
are declared as the same.
On aarch64, it looks to me one difference between NEW_DUAL_MODE_HELPER and OLD_DUAL_MODE_HELPER is that NEW_DUAL_MODE_HELPER
doesn’t do str x30,[J9VMTHREAD,{#}J9TR_VMThread_jitReturnAddress]
before calling the VM helper. My understanding of this instruction is that it updates currentThread->jitReturnAddress
.
Attach arm64nathelp.s.txt
As a comparison, jitNewObject
on aarch64 is declared as OLD_DUAL_MODE_HELPER
and it has str x30,[x19,#248]
before calling the helper [2] (248
(0xf8
) is the currentThread->jitReturnAddress
). However jitNewValue
doesn't [3].
As a reference, the implementation of NEW_DUAL_MODE_HELPER
on X86 also updates J9TR_VMThread_jitReturnAddress before calling the helper.. It looks to me the implementation of NEW_DUAL_MODE_HELPER
on aarch64 might be missing updating currentThread->jitReturnAddress
.
Konno-san @knn-k Could you help take a look and see if NEW_DUAL_MODE_HELPER
might have some issues such as miss updating currentThread->jitReturnAddress
? #8161 changed jitNewValue
from DUAL_MODE_HELPER
to NEW_DUAL_MODE_HELPER
. jitNewValue
is the only helper that currently declared as NEW_DUAL_MODE_HELPER
.
[1]
*** Invalid JIT return address 0000000000000000 in 0000FFFF96B323D8
...
#35 0x0000ffff951923a0 in J9AllocateObject (vmThread=0x11200, clazz=0x618700, allocateFlags=1)
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/openj9/runtime/gc_modron_startup/mgcalloc.cpp:416
#36 0x0000ffff95b5da10 in slow_jitNewValueImpl (nonZeroTLH=false, checkClassInit=true, currentThread=0x11200)
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/openj9/runtime/codert_vm/cnathelp.cpp:613
#37 old_slow_jitNewValue (currentThread=0x11200)
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/openj9/runtime/codert_vm/cnathelp.cpp:639
#38 0x0000ffff95b72f0c in jitNewValue ()
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/build/linux-aarch64-server-release/vm/runtime/codert_vm/arm64nathelp.s:974
#39 0x0000ffff74db30e8 in ?? ()
(gdb) fr 38
#38 0x0000ffff95b72f0c in jitNewValue ()
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/build/linux-aarch64-server-release/vm/runtime/codert_vm/arm64nathelp.s:974
974 blr x1
0x0000ffff95b72f08 <jitNewValue+48>: 20 00 3f d6 blr x1
(gdb) p currentThread->jitReturnAddress
No symbol "currentThread" in current context.
(gdb) down
#37 old_slow_jitNewValue (currentThread=0x11200)
at /home/jenkins/tmpa7ehuo/testvalhalla/src/openj9-openjdk-jdk.valuetypes/openj9/runtime/codert_vm/cnathelp.cpp:639
639 return slow_jitNewValueImpl(currentThread, true, false);
(gdb) p currentThread->jitReturnAddress
$5 = (void *) 0x0
[2]
aarch64
jitNewObject:
stp x29,x30,[sp,#392]
str x20,[x19,#32]
stp x0,x1,[sp,#160]
stp x2,x3,[sp,#176]
stp x4,x5,[sp,#192]
stp x6,x7,[sp,#208]
stp x8,x9,[sp,#224]
stp x10,x11,[sp,#240]
stp x12,x13,[sp,#256]
stp x14,x15,[sp,#272]
stp x16,x17,[sp,#288]
str x18,[sp,#304]
add x15, sp, 416
st1 {v0.4s, v1.4s, v2.4s, v3.4s}, [x15], #64
st1 {v4.4s, v5.4s, v6.4s, v7.4s}, [x15], #64
st1 {v8.4s, v9.4s, v10.4s, v11.4s}, [x15], #64
st1 {v12.4s, v13.4s, v14.4s, v15.4s}, [x15], #64
st1 {v16.4s, v17.4s, v18.4s, v19.4s}, [x15], #64
st1 {v20.4s, v21.4s, v22.4s, v23.4s}, [x15], #64
st1 {v24.4s, v25.4s, v26.4s, v27.4s}, [x15], #64
st1 {v28.4s, v29.4s, v30.4s, v31.4s}, [x15]
str x30,[x19,#248] //<=== store x30 to currentThread->jitReturnAddress ? (0xf8: void* jitReturnAddress = !j9x 0x0000000000000000)
mov x0,x19
bl old_fast_jitNewObject
...
[3]
aarch64
jitNewValue:
stp x29,x30,[sp,#392]
bl fast_jitNewValue
cbz x0,.L_done_jitNewValue
str x20,[x19,#32]
str x19,[sp,#312]
stp x20,x21,[sp,#320]
stp x22,x23,[sp,#336]
stp x24,x25,[sp,#352]
stp x26,x27,[sp,#368]
stp x28,x29,[sp,#384]
mov x1,x0
mov x0,x19
blr x1
cbz x0,.L_old_slow_jitNewValue
ret x0
...
NEW_DUAL_MODE_HELPER
is not enabled on AArch64 yet. See the following lines turning off the flag:
I had to change the declaration of jitNewValue
from DUAL_MODE_HELPER
to NEW_DUAL_MODE_HELPER
in #8161 for avoiding build errors because there is no definition of old_fast_jitNewValue
in cnathelp.cpp for the old dual mode.
And yes, NEW_DUAL_MODE_HELPER
lacks the step for saving the return address in VMThread->jitReturnAddress
, while pnathelp.m4 does that.
I opened PR #17100 as the fix for NEW_DUAL_MODE_HELPER
.
How can I run ValueTypeTestsJIT_1 with it?
How can I run ValueTypeTestsJIT_1 with it?
You can run functional Valhalla test in Jenkins. The functional Valhalla test includes ValueTypeTestsJIT_1
:
- platform: aarch64_linux_valhalla
- TESTS_TARGETS: sanity.functional,extended.functional
- Javanext
- Your own openj9 repo/branch
- OPENJDKnext_REPO: git@github.com:ibmruntimes/openj9-openjdk-jdk.valuetypes.git
An example of running a personal functional Valhalla test in Jenkins is: Pipeline-Build-Test-Personal/16101
The best is to run all functional Valhalla tests (sanity.functional,extended.functional
) on aarch64 since I remember ValueTypeTestsJIT_1
is not the only test that fails because jitNewValue
is tested in other tests as well.
Thank you. I started a job as Pipeline-Build-Test-Personal/16142/.
Pipeline-Build-Test-Personal/16142/ failed. It looks to me related to infrastructure.
I started Pipeline-Build-Test-Personal/16159/ with a branch test-knn-k-dualmodehelper.
Pipeline-Build-Test-Personal/16159/ shows all functional Valhalla tests now pass on aarch64
Failure running functional/Valhalla test ValueTypeTestsJIT_1 aarch64_linux. Failure can be found in internal server Pipeline-Build-Test-Personal build #11817.
Grinder run failed 10/10.