Open pshipton opened 4 years ago
@DanHeidinga fyi
@gacholio can you take a look at this?
Crash is here:
frame #15: 0x000000000b62fc68 libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(J9VMThread*) [inlined] VM_BytecodeInterpreterCompressed::invokevirtualLogic(this=0x00000000b007f908, fromBytecode=true) at BytecodeInterpreter.hpp:6795:18 [opt]
frame #16: 0x000000000b62fc0c libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(J9VMThread*) [inlined] VM_BytecodeInterpreterCompressed::invokevirtual(unsigned long*&, unsigned char*&) at BytecodeInterpreter.hpp:6807 [opt]
frame #17: 0x000000000b62fc0c libj9vm29.dylib`VM_BytecodeInterpreterCompressed::run(this=0x00000000b007f908, vmThread=<unavailable>) at BytecodeInterpreter.hpp:10263 [opt]
Assuming we just recently entered the interpreter, the reason is:
0x118: UDATA returnValue = 0x0000000000000011 (17)
#define J9_BCLOOP_RETURN_FROM_JIT 0x11
So it seems likely a bad object was returned from a JIT call.
The stack is not useful (vmThread roots have not been updated). Looking through the registers, R9 maps to a bytecode PC:
!findmethodfrompc 0x00000000C00DB2F2
Searching for PC=0x00000000C00DB2F2 in VM=0x00007F943C021620...
!j9method 0x000000000D534DC0 java/lang/Object.toString()Ljava/lang/String;
Bytecode PC offset = 0x000000000000000E
> !bytecodes 0x000000000D534DC0
Name: toString
Signature: ()Ljava/lang/String;
Access Flags (50001): public
Max Stack: 2
Argument Count: 1
Temp Count: 0
0 newdup 3 java/lang/StringBuilder
3 dup
4 invokespecial 4 java/lang/StringBuilder.<init>()V
7 aload0
8 invokespecial 5 java/lang/Object.getClass()Ljava/lang/Class;
11 invokevirtual 6 java/lang/Class.getName()Ljava/lang/String;
14 invokevirtual 7 java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
17 bipush 64
19 invokevirtual 8 java/lang/StringBuilder.append(C)Ljava/lang/StringBuilder;
22 aload0
23 invokevirtual 9 java/lang/Object.hashCode()I
26 invokestatic 10 java/lang/Integer.toHexString(I)Ljava/lang/String;
29 invokevirtual 7 java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
32 invokevirtual 11 java/lang/StringBuilder.toString()Ljava/lang/String;
35 return1
Debug Info:
Line Number Table (1):
Line: 164 PC: 0
Variables (1):
Slot: 0
Visibility Start: 0
Visibility End: 36
Visibility Length: 36
Name: this
Signature: Ljava/lang/Object;
Generic Signature: None
My best guess at the stack (from the registers and hand-examining the memory) is:
> !stackslots 0xd51d700,0x0D5FC218,0x0D5FC240,0x00000000C00DB2F2,0x000000000D534DC0
<d51d700> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x000000000D51D700 ***
<d51d700> ITERATE_O_SLOTS
<d51d700> RECORD_BYTECODE_PC_OFFSET
<d51d700> Initial values: walkSP = 0x000000000D5FC218, PC = 0x00000000C00DB2F2, literals = 0x000000000D534DC0, A0 = 0x000000000D5FC240, j2iFrame = 0x0000000000000000, ELS = 0x00000000B007FAD8, decomp = 0x0000000000000000
<d51d700> Bytecode frame: bp = 0x000000000D5FC238, sp = 0x000000000D5FC218, pc = 0x00000000C00DB2F2, cp = 0x000000000D534E60, arg0EA = 0x000000000D5FC240, flags = 0x0000000000000000
<d51d700> Method: java/lang/Object.toString()Ljava/lang/String; !j9method 0x000000000D534DC0
<d51d700> Bytecode index = 14
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC240 for 0x0000000000000001 slots
<d51d700> O-Slot: a0[0x000000000D5FC240] = 0x0000000030685868
<d51d700> Pending stack starting at 0x000000000D5FC220 for UDATA(0x0000000000000002) slots
<d51d700> O-Slot: p0[0x000000000D5FC220] = 0x0000000030787720
<d51d700> O-Slot: p1[0x000000000D5FC218] = 0x00000000300B9050
<d51d700> Bytecode frame: bp = 0x000000000D5FC258, sp = 0x000000000D5FC248, pc = 0x000000002624FDD3, cp = 0x000000000D87DB90, arg0EA = 0x000000000D5FC2B8, flags = 0x0000000000000000
<d51d700> Method: org/testng/internal/TestResult.init(Lorg/testng/IClass;Ljava/lang/Object;Lorg/testng/ITestNGMethod;Ljava/lang/Throwable;JJLorg/testng/ITestContext;)V !j9method 0x000000000D87E2E8
<d51d700> Bytecode index = 143
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC2B8 for 0x000000000000000C slots
<d51d700> O-Slot: a0[0x000000000D5FC2B8] = 0x0000000030787690
<d51d700> I-Slot: a1[0x000000000D5FC2B0] = 0x00000000301833D8
<d51d700> I-Slot: a2[0x000000000D5FC2A8] = 0x0000000030685868
<d51d700> I-Slot: a3[0x000000000D5FC2A0] = 0x00000000306CD228
<d51d700> I-Slot: a4[0x000000000D5FC298] = 0x0000000030787580
<d51d700> I-Slot: a5[0x000000000D5FC290] = 0x000000000D87E2C8
<d51d700> I-Slot: a6[0x000000000D5FC288] = 0x0000017278D7A005
<d51d700> I-Slot: a7[0x000000000D5FC280] = 0x00000000307876F8
<d51d700> I-Slot: a8[0x000000000D5FC278] = 0x0000017278D7F65D
<d51d700> I-Slot: a9[0x000000000D5FC270] = 0x000000003016EED0
<d51d700> I-Slot: t10[0x000000000D5FC268] = 0x00000000307876F8
<d51d700> I-Slot: t11[0x000000000D5FC260] = 0x00000000307876F8
<d51d700> Bytecode frame: bp = 0x000000000D5FC2D0, sp = 0x000000000D5FC2C0, pc = 0x000000002624FD0E, cp = 0x000000000D87DB90, arg0EA = 0x000000000D5FC320, flags = 0x0000000000000000
<d51d700> Method: org/testng/internal/TestResult.<init>(Lorg/testng/IClass;Ljava/lang/Object;Lorg/testng/ITestNGMethod;Ljava/lang/Throwable;JJLorg/testng/ITestContext;)V !j9method 0x000000000D87E2C8
<d51d700> Bytecode index = 70
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC320 for 0x000000000000000A slots
<d51d700> I-Slot: a0[0x000000000D5FC320] = 0x0000000030787690
<d51d700> I-Slot: a1[0x000000000D5FC318] = 0x00000000301833D8
<d51d700> I-Slot: a2[0x000000000D5FC310] = 0x0000000030685868
<d51d700> I-Slot: a3[0x000000000D5FC308] = 0x00000000306CD228
<d51d700> I-Slot: a4[0x000000000D5FC300] = 0x0000000030787580
<d51d700> I-Slot: a5[0x000000000D5FC2F8] = 0x00000000301833D8
<d51d700> I-Slot: a6[0x000000000D5FC2F0] = 0x0000017278D7A005
<d51d700> I-Slot: a7[0x000000000D5FC2E8] = 0x000000000D84E010
<d51d700> I-Slot: a8[0x000000000D5FC2E0] = 0x0000017278D7F65D
<d51d700> I-Slot: a9[0x000000000D5FC2D8] = 0x000000003016EED0
<d51d700> Bytecode frame: bp = 0x000000000D5FC340, sp = 0x000000000D5FC328, pc = 0x000000002621648C, cp = 0x000000000D84C370, arg0EA = 0x000000000D5FC458, flags = 0x0000000000000000
<d51d700> Method: org/testng/internal/Invoker.invokeTestMethods(Lorg/testng/ITestNGMethod;Lorg/testng/xml/XmlSuite;Ljava/util/Map;Lorg/testng/internal/ConfigurationGroupMethods;Ljava/lang/Object;Lorg/testng/ITestContext;)Ljava/util/List; !j9method 0x000000000D84E010
<d51d700> Bytecode index = 1,124
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC458 for 0x0000000000000023 slots
<d51d700> O-Slot: a0[0x000000000D5FC458] = 0x0000000030173178
<d51d700> O-Slot: a1[0x000000000D5FC450] = 0x00000000306CD228
<d51d700> O-Slot: a2[0x000000000D5FC448] = 0x0000000030135AA0
<d51d700> I-Slot: a3[0x000000000D5FC440] = 0x0000000030800CE0
<d51d700> O-Slot: a4[0x000000000D5FC438] = 0x000000003076B280
<d51d700> O-Slot: a5[0x000000000D5FC430] = 0x0000000030685868
<d51d700> O-Slot: a6[0x000000000D5FC428] = 0x000000003016EED0
<d51d700> I-Slot: t7[0x000000000D5FC420] = 0x0000000000000000
<d51d700> O-Slot: t8[0x000000000D5FC418] = 0x00000000306D2648
<d51d700> I-Slot: t9[0x000000000D5FC410] = 0x000000000D87CFC8
<d51d700> I-Slot: t10[0x000000000D5FC408] = 0x0000000000000000
<d51d700> I-Slot: t11[0x000000000D5FC400] = 0x0000000000000000
<d51d700> I-Slot: t12[0x000000000D5FC3F8] = 0x0000000000000000
<d51d700> O-Slot: t13[0x000000000D5FC3F0] = 0x00000000306D26A0
<d51d700> O-Slot: t14[0x000000000D5FC3E8] = 0x00000000301833D8
<d51d700> O-Slot: t15[0x000000000D5FC3E0] = 0x00000000306D2728
<d51d700> O-Slot: t16[0x000000000D5FC3D8] = 0x00000000306D2740
<d51d700> O-Slot: t17[0x000000000D5FC3D0] = 0x00000000306D2780
<d51d700> O-Slot: t18[0x000000000D5FC3C8] = 0x00000000306D27A8
<d51d700> I-Slot: t19[0x000000000D5FC3C0] = 0x000000000D8715F8
<d51d700> I-Slot: t20[0x000000000D5FC3B8] = 0x0000017278D7A005
<d51d700> I-Slot: t21[0x000000000D5FC3B0] = 0x00000000306D27B8
<d51d700> I-Slot: t22[0x000000000D5FC3A8] = 0x00000000306D27E0
<d51d700> I-Slot: t23[0x000000000D5FC3A0] = 0x00000000306D2A20
<d51d700> I-Slot: t24[0x000000000D5FC398] = 0x00000000306D29F8
<d51d700> I-Slot: t25[0x000000000D5FC390] = 0x0000000000000000
<d51d700> I-Slot: t26[0x000000000D5FC388] = 0x0000000030787580
<d51d700> I-Slot: t27[0x000000000D5FC380] = 0x00000000306D2EF0
<d51d700> I-Slot: t28[0x000000000D5FC378] = 0x00000000306D2F00
<d51d700> I-Slot: t29[0x000000000D5FC370] = 0x00000000FFFFFFFF
<d51d700> I-Slot: t30[0x000000000D5FC368] = 0x00000000307CCC08
<d51d700> I-Slot: t31[0x000000000D5FC360] = 0x00000000307CCC08
<d51d700> I-Slot: t32[0x000000000D5FC358] = 0x0000000030787580
<d51d700> I-Slot: t33[0x000000000D5FC350] = 0x0000000000000000
<d51d700> I-Slot: t34[0x000000000D5FC348] = 0x000000000D8864B8
<d51d700> Pending stack starting at 0x000000000D5FC328 for UDATA(0x0000000000000001) slots
<d51d700> O-Slot: p0[0x000000000D5FC328] = 0x0000000030787690
<d51d700> Bytecode frame: bp = 0x000000000D5FC470, sp = 0x000000000D5FC460, pc = 0x000000002624DB71, cp = 0x000000000D87C5A0, arg0EA = 0x000000000D5FC498, flags = 0x0000000000000000
<d51d700> Method: org/testng/internal/TestMethodWorker.invokeTestMethods(Lorg/testng/ITestNGMethod;Ljava/lang/Object;Lorg/testng/ITestContext;)V !j9method 0x000000000D87CFA8
<d51d700> Bytecode index = 21
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC498 for 0x0000000000000005 slots
<d51d700> O-Slot: a0[0x000000000D5FC498] = 0x00000000308083F8
<d51d700> I-Slot: a1[0x000000000D5FC490] = 0x00000000306CD228
<d51d700> I-Slot: a2[0x000000000D5FC488] = 0x0000000030685868
<d51d700> I-Slot: a3[0x000000000D5FC480] = 0x000000003016EED0
<d51d700> I-Slot: t4[0x000000000D5FC478] = 0x000000002624DAB8
<d51d700> Bytecode frame: bp = 0x000000000D5FC4B0, sp = 0x000000000D5FC4A0, pc = 0x000000002624DABF, cp = 0x000000000D87C5A0, arg0EA = 0x000000000D5FC4E0, flags = 0x0000000000000000
<d51d700> Method: org/testng/internal/TestMethodWorker.run()V !j9method 0x000000000D87CF88
<d51d700> Bytecode index = 63
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC4E0 for 0x0000000000000006 slots
<d51d700> O-Slot: a0[0x000000000D5FC4E0] = 0x00000000308083F8
<d51d700> O-Slot: t1[0x000000000D5FC4D8] = 0x00000000306D25D0
<d51d700> O-Slot: t2[0x000000000D5FC4D0] = 0x0000000030800BC8
<d51d700> I-Slot: t3[0x000000000D5FC4C8] = 0x00000000306CD228
<d51d700> O-Slot: t4[0x000000000D5FC4C0] = 0x00000000301833D8
<d51d700> I-Slot: t5[0x000000000D5FC4B8] = 0x000000000D845DD8
<d51d700> Bytecode frame: bp = 0x000000000D5FC4F8, sp = 0x000000000D5FC4E8, pc = 0x00000000262087F2, cp = 0x000000000D844320, arg0EA = 0x000000000D5FC540, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestRunner.privateRun(Lorg/testng/xml/XmlTest;)V !j9method 0x000000000D845DD8
<d51d700> Bytecode index = 302
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC540 for 0x0000000000000009 slots
<d51d700> O-Slot: a0[0x000000000D5FC540] = 0x000000003016EED0
<d51d700> I-Slot: a1[0x000000000D5FC538] = 0x00000000300B2098
<d51d700> I-Slot: t2[0x000000000D5FC530] = 0x0000000000000000
<d51d700> I-Slot: t3[0x000000000D5FC528] = 0x0000000000000001
<d51d700> O-Slot: t4[0x000000000D5FC520] = 0x00000000307DDD08
<d51d700> O-Slot: t5[0x000000000D5FC518] = 0x00000000307E7D60
<d51d700> I-Slot: t6[0x000000000D5FC510] = 0x00000000308008E0
<d51d700> O-Slot: t7[0x000000000D5FC508] = 0x0000000030808568
<d51d700> I-Slot: t8[0x000000000D5FC500] = 0x00000000308083F8
<d51d700> Bytecode frame: bp = 0x000000000D5FC558, sp = 0x000000000D5FC548, pc = 0x0000000026208561, cp = 0x000000000D844320, arg0EA = 0x000000000D5FC570, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestRunner.run()V !j9method 0x000000000D845D78
<d51d700> Bytecode index = 25
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC570 for 0x0000000000000003 slots
<d51d700> O-Slot: a0[0x000000000D5FC570] = 0x000000003016EED0
<d51d700> I-Slot: t1[0x000000000D5FC568] = 0x00000000300B2098
<d51d700> I-Slot: t2[0x000000000D5FC560] = 0x00000000261FBF81
<d51d700> Bytecode frame: bp = 0x000000000D5FC588, sp = 0x000000000D5FC578, pc = 0x00000000261FBF81, cp = 0x000000000D83F2B0, arg0EA = 0x000000000D5FC5A0, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunner.runTest(Lorg/testng/TestRunner;)V !j9method 0x000000000D840410
<d51d700> Bytecode index = 1
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC5A0 for 0x0000000000000003 slots
<d51d700> O-Slot: a0[0x000000000D5FC5A0] = 0x000000003016C4E0
<d51d700> O-Slot: a1[0x000000000D5FC598] = 0x000000003016EED0
<d51d700> I-Slot: t2[0x000000000D5FC590] = 0x000000000D5FC5D2
<d51d700> Bytecode frame: bp = 0x000000000D5FC5B8, sp = 0x000000000D5FC5A8, pc = 0x00000000261FBF3F, cp = 0x000000000D83F2B0, arg0EA = 0x000000000D5FC5D0, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunner.runSequentially()V !j9method 0x000000000D8403F0
<d51d700> Bytecode index = 31
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC5D0 for 0x0000000000000003 slots
<d51d700> O-Slot: a0[0x000000000D5FC5D0] = 0x000000003016C4E0
<d51d700> O-Slot: t1[0x000000000D5FC5C8] = 0x00000000307BF120
<d51d700> I-Slot: t2[0x000000000D5FC5C0] = 0x000000003016EED0
<d51d700> Bytecode frame: bp = 0x000000000D5FC5E8, sp = 0x000000000D5FC5D8, pc = 0x00000000261FBD98, cp = 0x000000000D83F2B0, arg0EA = 0x000000000D5FC638, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunner.privateRun()V !j9method 0x000000000D840370
<d51d700> Bytecode index = 288
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC638 for 0x000000000000000A slots
<d51d700> O-Slot: a0[0x000000000D5FC638] = 0x000000003016C4E0
<d51d700> I-Slot: t1[0x000000000D5FC630] = 0x00000000307BEFB0
<d51d700> O-Slot: t2[0x000000000D5FC628] = 0x00000000307BEFE8
<d51d700> O-Slot: t3[0x000000000D5FC620] = 0x0000000030173178
<d51d700> I-Slot: t4[0x000000000D5FC618] = 0x0000000000000000
<d51d700> I-Slot: t5[0x000000000D5FC610] = 0x000000003016EED0
<d51d700> I-Slot: t6[0x000000000D5FC608] = 0x0000000030764698
<d51d700> I-Slot: t7[0x000000000D5FC600] = 0x0000000000000000
<d51d700> I-Slot: t8[0x000000000D5FC5F8] = 0x0000000000000000
<d51d700> I-Slot: t9[0x000000000D5FC5F0] = 0x00000000261FB9AA
<d51d700> Bytecode frame: bp = 0x000000000D5FC650, sp = 0x000000000D5FC640, pc = 0x00000000261FBC16, cp = 0x000000000D83F2B0, arg0EA = 0x000000000D5FC660, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunner.run()V !j9method 0x000000000D840350
<d51d700> Bytecode index = 6
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC660 for 0x0000000000000002 slots
<d51d700> O-Slot: a0[0x000000000D5FC660] = 0x000000003016C4E0
<d51d700> I-Slot: t1[0x000000000D5FC658] = 0x000000000D5FC6AA
<d51d700> Bytecode frame: bp = 0x000000000D5FC678, sp = 0x000000000D5FC668, pc = 0x00000000262426EF, cp = 0x000000000D876E90, arg0EA = 0x000000000D5FC6A8, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunnerWorker.runSuite(Lorg/testng/internal/SuiteRunnerMap;Lorg/testng/xml/XmlSuite;)V !j9method 0x000000000D877610
<d51d700> Bytecode index = 87
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC6A8 for 0x0000000000000006 slots
<d51d700> I-Slot: a0[0x000000000D5FC6A8] = 0x00000000307BE528
<d51d700> O-Slot: a1[0x000000000D5FC6A0] = 0x000000003016B708
<d51d700> O-Slot: a2[0x000000000D5FC698] = 0x000000003013E098
<d51d700> I-Slot: t3[0x000000000D5FC690] = 0x000000003016C4E0
<d51d700> I-Slot: t4[0x000000000D5FC688] = 0x0000000026242804
<d51d700> I-Slot: t5[0x000000000D5FC680] = 0x000000000D877630
<d51d700> Bytecode frame: bp = 0x000000000D5FC6C0, sp = 0x000000000D5FC6B0, pc = 0x0000000026242804, cp = 0x000000000D876E90, arg0EA = 0x000000000D5FC6C8, flags = 0x0000000000000000
<d51d700> Method: org/testng/SuiteRunnerWorker.run()V !j9method 0x000000000D877630
<d51d700> Bytecode index = 12
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC6C8 for 0x0000000000000001 slots
<d51d700> I-Slot: a0[0x000000000D5FC6C8] = 0x00000000307BE528
<d51d700> Bytecode frame: bp = 0x000000000D5FC6E0, sp = 0x000000000D5FC6D0, pc = 0x0000000025248316, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC718, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.runSuitesSequentially(Lorg/testng/xml/XmlSuite;Lorg/testng/internal/SuiteRunnerMap;ILjava/lang/String;)V !j9method 0x000000000D6A6FD8
<d51d700> Bytecode index = 66
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC718 for 0x0000000000000007 slots
<d51d700> I-Slot: a0[0x000000000D5FC718] = 0x00000000300A7160
<d51d700> I-Slot: a1[0x000000000D5FC710] = 0x000000003013E098
<d51d700> I-Slot: a2[0x000000000D5FC708] = 0x000000003016B708
<d51d700> I-Slot: a3[0x000000000D5FC700] = 0x0000000000000002
<d51d700> I-Slot: a4[0x000000000D5FC6F8] = 0x00000000300A5738
<d51d700> I-Slot: t5[0x000000000D5FC6F0] = 0x00000000307BE528
<d51d700> I-Slot: t6[0x000000000D5FC6E8] = 0x00000000252480D4
<d51d700> Bytecode frame: bp = 0x000000000D5FC730, sp = 0x000000000D5FC720, pc = 0x00000000252480D4, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC760, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.runSuitesLocally()Ljava/util/List; !j9method 0x000000000D6A6F78
<d51d700> Bytecode index = 156
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC760 for 0x0000000000000006 slots
<d51d700> O-Slot: a0[0x000000000D5FC760] = 0x00000000300A7160
<d51d700> O-Slot: t1[0x000000000D5FC758] = 0x000000003016B708
<d51d700> O-Slot: t2[0x000000000D5FC750] = 0x00000000307B5EF0
<d51d700> I-Slot: t3[0x000000000D5FC748] = 0x000000003013E098
<d51d700> I-Slot: t4[0x000000000D5FC740] = 0x0000000000000000
<d51d700> I-Slot: t5[0x000000000D5FC738] = 0x000000000D6A6EB8
<d51d700> Bytecode frame: bp = 0x000000000D5FC778, sp = 0x000000000D5FC768, pc = 0x0000000025247D51, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC780, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.runSuites()Ljava/util/List; !j9method 0x000000000D6A6E98
<d51d700> Bytecode index = 1
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC780 for 0x0000000000000001 slots
<d51d700> I-Slot: a0[0x000000000D5FC780] = 0x000000003014A918
<d51d700> Bytecode frame: bp = 0x000000000D5FC798, sp = 0x000000000D5FC788, pc = 0x0000000025247CBD, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC7A8, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.run()V !j9method 0x000000000D6A6E78
<d51d700> Bytecode index = 25
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC7A8 for 0x0000000000000002 slots
<d51d700> O-Slot: a0[0x000000000D5FC7A8] = 0x00000000300A7160
<d51d700> I-Slot: t1[0x000000000D5FC7A0] = 0x000000000D5FC7E2
<d51d700> Bytecode frame: bp = 0x000000000D5FC7C0, sp = 0x000000000D5FC7B0, pc = 0x0000000025248767, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC7E0, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.privateMain([Ljava/lang/String;Lorg/testng/ITestListener;)Lorg/testng/TestNG; !j9method 0x000000000D6A7098
<d51d700> Bytecode index = 59
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC7E0 for 0x0000000000000004 slots
<d51d700> I-Slot: a0[0x000000000D5FC7E0] = 0x000000003012AB80
<d51d700> I-Slot: a1[0x000000000D5FC7D8] = 0x0000000000000000
<d51d700> O-Slot: t2[0x000000000D5FC7D0] = 0x00000000300A7160
<d51d700> I-Slot: t3[0x000000000D5FC7C8] = 0x000000003020F7E8
<d51d700> Bytecode frame: bp = 0x000000000D5FC7F8, sp = 0x000000000D5FC7E8, pc = 0x00000000252486F6, cp = 0x000000000D6A4220, arg0EA = 0x000000000D5FC808, flags = 0x0000000000000000
<d51d700> Method: org/testng/TestNG.main([Ljava/lang/String;)V !j9method 0x000000000D6A7078
<d51d700> Bytecode index = 2
<d51d700> Using local mapper
<d51d700> Locals starting at 0x000000000D5FC808 for 0x0000000000000002 slots
<d51d700> I-Slot: a0[0x000000000D5FC808] = 0x000000003012AB80
<d51d700> I-Slot: t1[0x000000000D5FC800] = 0x000000000D5FC828
<d51d700> JNI call-in frame: bp = 0x000000000D5FC830, sp = 0x000000000D5FC810, pc = 0x000000000B7E0480, cp = 0x0000000000000000, arg0EA = 0x000000000D5FC830, flags = 0x0000000000020000
<d51d700> New ELS = 0x0000000000000000
<d51d700> JNI native method frame: bp = 0x000000000D5FC8D8, sp = 0x000000000D5FC838, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x000000000D5FC8D8, flags = 0x0000000000000010
<d51d700> Object pushes starting at 0x000000000D5FC838 for 16 slots
<d51d700> Push[0x000000000D5FC838] = 0x0000000030000000
<d51d700> Push[0x000000000D5FC840] = 0x000000003009E9C0
<d51d700> Push[0x000000000D5FC848] = 0x0000000000000000
<d51d700> Push[0x000000000D5FC850] = 0x0000000000000000
<d51d700> Push[0x000000000D5FC858] = 0x00000000300A3368
<d51d700> Push[0x000000000D5FC860] = 0x0000000030001818
<d51d700> Push[0x000000000D5FC868] = 0x000000003009E968
<d51d700> Push[0x000000000D5FC870] = 0x000000003009E968
<d51d700> Push[0x000000000D5FC878] = 0x00000000300771E0
<d51d700> Push[0x000000000D5FC880] = 0x0000000000000000
<d51d700> Push[0x000000000D5FC888] = 0x000000003007BB00
<d51d700> Push[0x000000000D5FC890] = 0x0000000030008DE8
<d51d700> Push[0x000000000D5FC898] = 0x0000000030008D38
<d51d700> Push[0x000000000D5FC8A0] = 0x00000000300010E0
<d51d700> Push[0x000000000D5FC8A8] = 0x0000000030003F20
<d51d700> Push[0x000000000D5FC8B0] = 0x0000000030000000
<d51d700> <end of stack>
<d51d700> *** END STACK WALK (rc = NONE) ***
> !j9object 0x0000000030685868 <--- this
!J9Object 0x0000000030685868 {
struct J9Class* clazz = !j9class 0xD818100 // j9vm/test/arrayCopy/ArrayCopyTest
Object flags = 0x00000008;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
}
> !j9object 0x00000000300B9050 <--- this.getName()
J9VMJavaLangString at 0x00000000300B9050 {
struct J9Class* clazz = !j9class 0xD541400 // java/lang/String
Object flags = 0x00000000;
[B value = !fj9object 0x300b2198 (offset = 0) (java/lang/String)
B coder = 0x00000001 (offset = 4) (java/lang/String)
I hashCode = 0x1E58820C (offset = 8) (java/lang/String)
"j9vm.test.arrayCopy.ArrayCopyTest"
}
> !j9object 0x0000000030787720 <--- StringBuilder
Unable to read object clazz at 0x0000000030787720 (clazz = (null))
So the String argument is fine, but the StringBuilder appears to be trash.
I suspect jitted AbstractStringBuilder.init(I)V
returned the bad object:
> !j9method 0x000000000D5AC0F8
J9Method at 0xd5ac0f8 {
Fields for J9Method:
0x0: U8* bytecodes = !j9x 0x00000000C0198AF4
0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x000000000D5AD7A0 (flags = 0x0)
0x10: void* methodRunAddress = !j9x 0x0000000000000006
0x18: void* extra = !j9x 0x0000000000000783
}
Signature: java/lang/StringBuilder.<init>()V !bytecodes 0x000000000D5AC0F8
ROM Method: !j9rommethod 0x00000000C0198AE0
Next Method: !j9method 0x000000000D5AC118
> !bytecodes 0x000000000D5AC0F8
Name: <init>
Signature: ()V
Access Flags (21040001): public
Max Stack: 2
Argument Count: 1
Temp Count: 0
0 aload0
1 bipush 16
3 invokespecial 1 java/lang/AbstractStringBuilder.<init>(I)V
6 returnFromConstructor
> !j9method 0x000000000D5AA760
J9Method at 0xd5aa760 {
Fields for J9Method:
0x0: U8* bytecodes = !j9x 0x00000000C019AA88
0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x000000000D5AB140 (flags = 0x0)
0x10: void* methodRunAddress = !j9x 0x0000000000000018
0x18: void* extra = !j9x 0x00000000B00ADC80
}
Signature: java/lang/AbstractStringBuilder.<init>(I)V !bytecodes 0x000000000D5AA760
ROM Method: !j9rommethod 0x00000000C019AA74
Next Method: !j9method 0x000000000D5AA780
@andrewcraik
@gacholio / @DanHeidinga do you have any failure rate information for this or any history of when it might have started. I agree we should look at it, but I'm not sure if 0.21 is an appropriate target.
@cathyzhyi could you help find someone to look at this? Note the 0.21 tag.
any history of when it might have started.
It appears to be a new failure we haven't seen before. The first instance I can find is the one in the opening description.
but I'm not sure if 0.21 is an appropriate target.
I added the 0.21 tag to ensure there would be a conversation about whether this is stop-ship or not given it appears to be a newly introduced (exposed?) issue
@liqunl could you have a sniff?
I looked at the disassembly of java/lang/AbstractStringBuilder.<init>(I)V
, haven't found anything suspicious.
I have also tried to reproduce the failure on my laptop, but haven't seen one in 50 runs. I'll try if I can reproduce it with grinders.
I've tried another 50 runs on my laptop, no reproduce. I'm trying to run grinders on internal jenkins, but I can't find a Mac uma build. @pshipton What is a Mac uma build? How is it different from Mac build?
A Mac uma build is using the old (uma) makefile system, any build before May 28th used this. Since https://github.com/ibmruntimes/openj9-openjdk-jdk11/pull/308 was merged, the jdk11 Mac builds use the cmake build system, which is supposed to generate the same binaries. We continue to run a nightly build using the old uma makefile system in case any problems occur, we can compare between the two builds. Likely the problem isn't particular to the uma build, it's just intermittent and happened in that build. We'll be switching the xlinux cmake jdk11 build to cmake next, ideally before the end of next week.
I tried 100 jobs on internal grinder, still can't reproduce the issue. So far the crash is not reproducible in 200 jobs.
The StringBuilder object seems to be corrupted by array zero initialization instruction in the jitted code. The zero initialization is accomplished by rep stosb
and it seems to be doing backward store in this case.
The bad StringBuilder object has everything right except the header. The header is zero. However, the header can’t be zero when passed to the jitted code, as the jitted code read a flag on the object's j9class to check if the object is finalizable.
I double checked the jitted code to see if it’s possible that we can write to the StringBuilder object. Notice that the jitted code will create a byte array and assign it to StringBuilder.value. Both the StringBuilder and the byte array are in TLH and they’re next to each other with a few padding bytes in between. The byte array is supposed to be zero initialized and remain so before we crash. But not all elements are zero.
> !j9x 0x0000000030787720,32,4,1
0x30787720 : 00000000 [ .... ]. <— StringBuilder object
0x30787724 : 00000000 [ .... ]
0x30787728 : 30787738 [ 8wx0 ] <— value field, byte array of 32 elements
0x3078772C : 00000001 [ .... ] <— coder field
0x30787730 : 00000000 [ .... ] <- count field
0x30787734 : 00000000 [ .... ] <— padding bytes
0x30787738 : 0d53d900 [ ..S. ] <-- byte array header
0x3078773C : 00000020 [ ... ] <-- byte array size
0x30787740 : 00000000 [ .... ]
0x30787744 : 30184100 [ .A.0 ]. <—- non zero elements
0x30787748 : 00000000 [ .... ]
0x3078774C : 30184128 [ (A.0 ] <—- non zero elements
> !j9object 0x30787738
!J9IndexableObject 0x0000000030787738 {
struct J9Class* clazz = !j9arrayclass 0xD53D900 // [B
Object flags = 0x00000000;
U_32 size = 0x00000020;
So it seems to me that we didn’t initialize the byte array. We didn’t skip zero init the byte array either. The following instruction is used to zero the array. rdi initially points to the first element of the array, the counter rcx is 0x20.
rep stosb byte ptr [rdi], al
I suspect that we did a backward memset. We didn’t clear the direction flag DF before rep stosb
, so if the flag is set by previously executed instructions, we may end up doing a backward memset.
The memory content seems to be in line with this theory. rdi is the address of the first element, which is 0x30787740, if we do backward memset for 0x20 bytes, we’ll zero the memory all the way up to 0x30787720, which is the header of StringBuilder object. The byte array header and fields of StringBuilder object are good because that is set after the zeroing.
@andrewcraik @JamesKingdon Do you think this is the right cause of the crash. If so, I think the solution is to clear the DF flag before zeroing.
I was under the impression that DF is supposed to remain clear at all times (in C at least). It would be a simple matter to clear it on transition to JIT code (and maybe you would need to do it on return from JNI calls).
@liqunl Great find! It certainly sounds possible. Given GAC's comments I need to check how DF can be set, but I'm leaning towards a conservative fix of making sure it has the correct value before executing the rep sto. @andrewcraik ?
Looks like it can only be changed directly using CLD/STD. I feel a little uneasy about relying on it being in the correct state before the rep sto though.
Looking at the Agner Fog instruction tables https://www.agner.org/optimize/instruction_tables.pdf it seems that CLD/STD are not cheap - latency 4 on skylake/skylakeX and latency 3 on ryzen. I think we need to be a bit careful about spraying CLD etc everywhere.
Now there is some curious looking code in https://github.com/eclipse/openj9/blob/71bf16037a88e1116afdcc7516db040aa5365ab1/runtime/compiler/x/codegen/J9TreeEvaluator.cpp#L1622
Note if it is !node->isForwardArrayCopy()
we handle the direction flag and yet seemingly underneath we don't?
We can consider the CLD/STD always solution as a temporary fix but we need to perf evaluate that and also try to figure out how the flag got messed up since protecting around natives and stuff as suggested by @gacholio is probably the better long term solution.
oh, you mean potentially we set STD for a backward array copy and then never clear it? I'm still very nervous about the idea of relying on it having the right value over long periods of time.
The rationale is that backwards copy is vastly less likely than forwards, so keeping the DF clear when not doing backwards copy will be a noticeable performance win.
From what I can see, backwards copy always emits STD/REP/CLD, so it's still a mystery why it would ever be set (assuming I'm right about the C linkage keeping it clear).
DF can also be set via popf which makes it a little harder to narrow down possible changes. Potentially DF might have been set as an accidental side effect of some other flag manipulation. A very cursory grep of the codebase shows some hits for popf* but I haven't explored further to see what's being done.
Assuming pushf/pop are matched, that shouldn't be an issue. If there are places using popf on other stacked values, that could certainly cause this (this seems exceedingly unlikely).
util/sse2enabled.s is doing a flag manipulation, but it looks ok. Other popf instances seem to be simple restore of saved flags, we would have to have mismatched the stacks to get those wrong.
@JamesKingdon suggested offline, that this problem should be detectable if we check DF flag frequently enough. I'll try adding the check to the jit method entry to see if we can detect it.
I did personal builds on x86 linux, windows and Mac, with a check on the DF flag in method entry. If the flag is set, the jitted code will crash. All the jobs passed. Also did a 40x grinder on the original failed job, no crash either. The issue seems to be extremely intermittent.
As for the temporary fix, clearing the DF flag before every rep instructions has 2% overhead on daytrader3. If we only clear DF flag after the return of helper calls and JNI calls (assuming VM will clear the flag before calling to jitted code), the overhead is around 0.5% (with variation of 0.6%).
@andrewcraik @JamesKingdon Could you advice on how we should proceed with this issue?
On the remote possibility that the DF is being set by an interrupt handler, clearing it before the rep tightens the window but doesn't close it completely. If this were the case, I'd expect every app in the world to be crashing randomly.
I don't think we can take the 2% hit on daytrader3 - that is a hit that will show up a lot more for some people and it is an indication that it could be a real performance issue. If we can get away with handling after the return of helper calls and JNI calls with the VM clearing it before calls to JITCODE then I think we should be ok and can get away without the major perf hit.
That seems like a reasonable compromise given the cost and the difficulty of reproducing the problem. Does the VM already clear DF before calling the JIT or is that a new requirement?
On the remote possibility that the DF is being set by an interrupt handler, clearing it before the rep tightens the window but doesn't close it completely. If this were the case, I'd expect every app in the world to be crashing randomly.
You sure know how to terrify the service guy :)
Given that this has never been reproduced, I'd be tempted to ignore it completely for now.
The symptoms sound like a fit for several cases that went unsolved over the last year or two. I've been looking for over-run from objects before the corrupted one and not finding anything - I hadn't considered the possibility of a backwards rep sto, not least because the fields immediately after the corruption are valid. If this is the problem I'm very keen to get it fixed.
It's a simple enough matter for the VM to clear the DF on return from any standard JIT helper and on transition into compiled code. The JIT will need to be responsible for clearing it after any C call from the compiled code, including the direct-call JIT helpers and direct JNI (and whatever else you may call).
Extremely complicated VM fix here:
I find the fix is more complicated than I thought. Regarding clearing the flag after the helper calls, my initial fix only cleared the flag for helper calls that associated with a call node, as there is a designated place to evaluating such node. But we generate a lot of helper calls in other evaluators for nodes that are not call to a helper, such as resolve check, interface calls, field access (helper calls can be generated for field watch). Some helpers are written in assembly and may call to the VM.
I wonder if we should do the flag clearing for all of the helpers, or if there are some of them we can trust. If we need to do the clearing, where is the best place to do so.
For minimal impact, I would clear it only on transition to compiled code and on return from direct JNI - if we can't trust the C compiler used by the VM to follow the ABI, then we're never going to get this fixed. I suppose we would also need to clear it on return from any helper that might call a JVMTI event.
Having said that, the test that's failing here isn't using any foreign native libraries, so I can't explain how this could occur.
@andrewcraik @JamesKingdon What are your thoughts? Should we only clear the flag on return from foreign native code and trust VM code? It sounds reasonable to me and we have to trust the C compiler. Although the fix may not fix this issue as the test doesn't use any foreign native libraries as GAC said.
I'll take whatever we can get for a reasonable cost. I'll keep a close eye on future cases and over time we can see if the problem goes away. It's frustrating that we can't reproduce the problem and confirm the cause, but thank you for having tried.
So I'd say let's not trust the foreign code and let's trust the VM for now. It is possible the bit just flipped because that kind of thing can happen once in a blue moon, but let's monitor the situation and if we see more examples or suspected examples we can get more paranoid. At least not trusting foreign code eliminates one class of issues.
You can't trust the VM (and hence many JIT helpers) - if we call a bad native, the DF may end up being set (again, we should be crashing all over the place if this were actually occurring).
libffi does not appear to manually clear the flag on return - I suppose we could do it with some inline assembly in the interpreter.
@JamesKingdon thoughts?
Rip out rep sto and burn it. If we can't guarantee that it's going to write to the correct memory then I don't think we can use it.
Not an option right now - that is a huge change - there is perf work to look at the sequences, but in the short to medium term we need some kind of halfway point. I think maybe protect ffi and protect the calls we need to protect in the JIT.
@andrewcraik @JamesKingdon Just to clarify before I make the change. We will trust VM given that VM will clear the flag on return of foreign native call. This means that the JIT only need to clear flag after the return of JNI call and helper that may call JVMTI. Am I right?
@gacholio What helpers can call to JVMTI?
I think before we make a decision we should discuss as a group - I know @DanHeidinga is interested in discussing - let me try to set something up amongst the 5 of us tomorrow so we can get this nailed down and decide what we think the right thing to do is for 0.21 and we can summarize the result here for anyone interested.
Summary of an offline discussion between James, GAC, Andrew, Dan and me:
Since we're not able to reproduce the crash nor detecting incorrect DF flag, we started to think if it's caused by other things. We have considered other possibilities that may corrupt the object header.
The DF flag bit is flipped by radioactive, or there is a C compiler bug generating bad VM/JIT code that sets the flag, or the header is stomped by other thread.
The first two are unlikely as James has seen several defects with corrupted memory that fits into this pattern (header is zero but all fields look right), and those defects are on different OS.
The third one is ruled out after we took a look at the RFlag in javacore, it's 0x0000000000010646, the DF flag mask is 0x0400, so the flag was indeed set when the crash occurred. Given the short window between the rep stosb instruction and the crash, the flag was set when rep stosb executed. The original theory is confirmed.
However, we still don't know when the flag was set and who set it. Clearing the flag before each rep instruction will likely solve this problem but the performance cost is too large. And we can't clear it on the return of foreign native because we don't know if it's set by foreign native.
The proposal is that we defer this issue to next release, and we'll add diagnostic instructions in the jitted method to detect the situation where DF is incorrectly set and crash the program in an attempt to catch the case in our builds. The diagnostic feature can be turned on by setting an evironment variable and added after this release. @andrewcraik @DanHeidinga
The proposal is that we defer this issue to next release,
+1
and we'll add diagnostic instructions in the jitted method to detect the situation where DF is incorrectly set and crash the program in an attempt to catch the case in our builds. The diagnostic feature can be turned on by setting an evironment variable and added after this release.
I think this makes a lot of sense. Let's grind the testing with the option enabled and see if we can find another occurrence of this issue.
Moved to 0.22
https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.functional_x86-64_mac_uma_Nightly_testList_0/5 TestArrayCopy_3 osx1011-x86-2