Closed JasonFengJ9 closed 2 years ago
The grinder failed 100%.
vmState [0x5ff04]: {J9VMSTATE_JIT} {InstructionSelection}
@r30shah : please assign for investigation
I sniffed at the core-dump from the internal jenkins build that @JasonFengJ9 has mentioned in the description. Looking at the function where it fails with the segmentation fault [1] printing the disassembly of the failing function here,
(gdb) disassemble 0x000003FF86015CFA
Dump of assembler code for function TR_J9VMBase::getResolvedVirtualMethod(TR_OpaqueClassBlock*, int, bool):
0x000003ff86015c78 <+0>: stmg %r8,%r15,64(%r15)
0x000003ff86015c7e <+6>: lg %r1,0(%r2)
0x000003ff86015c84 <+12>: lay %r15,-160(%r15)
0x000003ff86015c8a <+18>: lgr %r11,%r2
0x000003ff86015c8e <+22>: lgr %r8,%r4
0x000003ff86015c92 <+26>: lgr %r9,%r5
0x000003ff86015c96 <+30>: lgr %r10,%r3
0x000003ff86015c9a <+34>: lg %r1,464(%r1)
0x000003ff86015ca0 <+40>: basr %r14,%r1
0x000003ff86015ca2 <+42>: ltr %r2,%r2
0x000003ff86015ca4 <+44>: jne 0x3ff86015d1a <TR_J9VMBase::getResolvedVirtualMethod(TR_OpaqueClassBlock*, int, bool)+162>
0x000003ff86015ca8 <+48>: larl %r1,0x3ff86b58d60
0x000003ff86015cae <+54>: lgr %r3,%r10
0x000003ff86015cb2 <+58>: lg %r1,0(%r1)
0x000003ff86015cb8 <+64>: lg %r2,0(%r1)
0x000003ff86015cbe <+70>: aghi %r2,236
0x000003ff86015cc2 <+74>: brasl %r14,0x3ff860016f0 <J9::ClassEnv::convertClassOffsetToClassPtr(TR_OpaqueClassBlock*)>
0x000003ff86015cc8 <+80>: lg %r1,0(%r11)
0x000003ff86015cce <+86>: lgr %r10,%r2
0x000003ff86015cd2 <+90>: llgfr %r3,%r8
0x000003ff86015cd6 <+94>: lgr %r2,%r11
0x000003ff86015cda <+98>: lg %r1,2288(%r1)
0x000003ff86015ce0 <+104>: basr %r14,%r1
0x000003ff86015ce2 <+106>: ltg %r2,0(%r2,%r10) // R2 = callOffsetFromVTable R10 = J9Class.
0x000003ff86015ce8 <+112>: je 0x3ff86015d1a <TR_J9VMBase::getResolvedVirtualMethod(TR_OpaqueClassBlock*, int, bool)+162>
0x000003ff86015cec <+116>: lg %r1,24(%r11)
0x000003ff86015cf2 <+122>: tm 1309(%r1),8
0x000003ff86015cf6 <+126>: jne 0x3ff86015d12 <TR_J9VMBase::getResolvedVirtualMethod(TR_OpaqueClassBlock*, int, bool)+154>
0x000003ff86015cfa <+130>: ltg %r1,0(%r2) // IT FAILS HERE TRYING TO DEREFERENCE R2
It fails trying to get the J9Method of the virtual method from the J9Class* at virtual call offset from VTable.
Call chain that ends up calling this getResolvedVirtualMethod
is as following.
TR_J9VMBase::getResolvedVirtualMethod(TR_OpaqueClassBlock*, int, bool)
TR_ResolvedJ9Method::getResolvedVirtualMethod(TR::Compilation*, TR_OpaqueClassBlock*, int, bool)
J9::Z::PrivateLinkage::buildVirtualDispatch(TR::Node*, TR::RegisterDependencyConditions*, TR::Register*, unsigned int)
Looking at the point where we call getResolvedVirtualMethod
in `buildVirtualDispatch)
0x3ff8634079e +13838 ~ c0e5ffe296bd brasl %r14, 0x3ff85f93518 ^{libj9jit29.so}{_ZN2J913CodeGenerator36isProfiledClassAndCallSiteCompatibleEP19TR_OpaqueClassBlockS2_} +0
0x3ff863407a4 +13844 ~ 4220f14f stc %r2, 0x14f(%r15)
0x3ff863407a8 +13848 ~ 1222 ltr %r2, %r2
0x3ff863407aa +13850 ~ a784f19d je 0x3ff8633eae4 C>> ^+6484
0x3ff863407ae +13854 ~ e31070080004 lg %r1, 8(%r7)
0x3ff863407b4 +13860 ~ b9040028 lgr %r2, %r8
0x3ff863407b8 +13864 ~ a7690001 lghi %r6, 1
0x3ff863407bc +13868 ~ e33010080004 lg %r3, 8(%r1)
0x3ff863407c2 +13874 ~ c0e500058c6f brasl %r14, 0x3ff863f20a0 {libj9jit29.so}{_ZN3OMR15SymbolReference15getOwningMethodEPN2TR11CompilationE} +0
0x3ff863407c8 +13880 ~ e31020000004 lg %r1, 0(%r2)
0x3ff863407ce +13886 ~ e33070080004 lg %r3, 8(%r7)
0x3ff863407d4 +13892 ~ e350801c0014 lgf %r5, 0x1c(%r8)
0x3ff863407da +13898 ~ e340f1400004 lg %r4, 0x140(%r15)
0x3ff863407e0 +13904 ~ e31014000004 lg %r1, 0x400(%r1)
0x3ff863407e6 +13910 ~ e33030080004 lg %r3, 8(%r3)
0x3ff863407ec +13916 ~ 0de1 basr %r14, %r1
0x3ff863407ee +13918 ~ b9020012 ltgr %r1, %r2
It seems like the call is coming from [2] where it is trying to get the virtual method from the profiled class. We did recently made a change in this area [3], where profiled directed devirtualization was enable on Z. So this seems to be new failure caused by [3]. @Spencer-Comin let's take a look at this failure and see if there is an issue with profiled class or something else.
[1]. https://github.com/eclipse-openj9/openj9/blob/4b548aa58c694b4b103c7e8cefcc6782f4e34b7d/runtime/compiler/env/j9method.cpp#L7183-L7205 [2]. https://github.com/eclipse-openj9/openj9/blob/4b548aa58c694b4b103c7e8cefcc6782f4e34b7d/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L1832 [3]. https://github.com/eclipse-openj9/openj9/commit/6e05d8c18463f07d928883218b97103e272fcc44
Comparing the profile-directed devirtualization in J9::Z::PrivateLinkage::buildVirtualDispatch
[1] with the equivalent codegen code for X [2], X has a check for (callNode->getSymbolReference() != comp()->getSymRefTab()->findObjectNewInstanceImplSymbol())
that is not found in Z. I'm not exactly sure why that check is there in X, but inserting it into the Z code [3] fixes this failure.
[1]. https://github.com/eclipse-openj9/openj9/blob/d4df04f898bf16503d0e404e3b11acb9ff893dd8/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L1801-L1805 [2]. https://github.com/eclipse-openj9/openj9/blob/d4df04f898bf16503d0e404e3b11acb9ff893dd8/runtime/compiler/x/codegen/X86PrivateLinkage.cpp#L1320-L1323 [3]. https://github.com/Spencer-Comin/openj9/commit/2e6f21645f84a21b58b0272905e2dc6d59e32072#diff-ad78dee10efc18355c205b7949223951d627fe5d678979e0bd25b1eb32d3c2de
diff --git a/runtime/compiler/z/codegen/S390PrivateLinkage.cpp b/runtime/compiler/z/codegen/S390PrivateLinkage.cpp
index ff2af31bc29..9ed597e3e11 100644
--- a/runtime/compiler/z/codegen/S390PrivateLinkage.cpp
+++ b/runtime/compiler/z/codegen/S390PrivateLinkage.cpp
@@ -1800,6 +1800,7 @@ J9::Z::PrivateLinkage::buildVirtualDispatch(TR::Node * callNode, TR::RegisterDep
if (!performGuardedDevirtualization &&
!comp()->getOption(TR_DisableInterpreterProfiling) &&
+ (callNode->getSymbolReference() != comp()->getSymRefTab()->findObjectNewInstanceImplSymbol()) &&
TR_ValueProfileInfoManager::get(comp()) && resolvedMethod
)
{
@Spencer-Comin Looking at the code that we generate for Class.newInstance
[1] and other places in codegen, I do see that this method is specially treated by JIT.
Looking at newInstanceImpl
call in that method[2] , it is actually a static call to native method. We do have to treat this call specially and it will be incorrect to find this method in the vtable of the class.
If grinders and builds are working, I would say let's go ahead with the fix with the commit documenting this.
[1]. https://github.com/eclipse-openj9/openj9/blob/d2e1f22d76429d35ef3420c200a36eb87a977b9a/runtime/compiler/ilgen/Walker.cpp#L4305-L4323 [2]. https://github.com/eclipse-openj9/openj9/blob/d2e1f22d76429d35ef3420c200a36eb87a977b9a/jcl/src/java.base/share/classes/java/lang/Class.java#L2672
Re-opening until the fix is added to the 0.33 release.
Failure link
From an internal build
job/Test_openjdk8_j9_extended.openjdk_s390x_linux/66/
(ub18s390xrt-1-6
):Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
50x grinder -
job/Grinder/24535/