Closed andrew-m-leonard closed 5 years ago
@andrew-m-leonard I see a couple of different crashes in the Jenkins output. Are you able to capture the javacores / system cores / etc from the crashes to aid the investigations?
13:26:32 [2019-06-03 08:26:32,170] Agent[1]: stderr: 12:26:32.170 0x8913300 omrport.359 * ASSERTION FAILED at ../../omr/port/common/omrmemtag.c:145: ((memoryCorruptionDetected))
@DanHeidinga I'm not aware of a way of getting hyc-runtimes dumps? @AdamBrousseau does the worspace get cleared?
This might be the same issue reported at https://github.com/eclipse/openj9/issues/5399, particularly https://github.com/eclipse/openj9/issues/5399#issuecomment-481877959.
Dumps are in the archived openjdk_test_output.tar.gz file uploaded to artifactory.
Similar issue on 0.14.0
release:
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder_Advanced/13/console
Assertion failure:
13:26:32 [2019-06-03 08:26:32,170] Agent[1]: stderr: 12:26:32.170 0x8913300 omrport.359 * ** ASSERTION FAILED ** at ../../omr/port/common/omrmemtag.c:145: ((memoryCorruptionDetected))
$ lldb jdk-11.0.3+7/Contents/Home/bin/java -c core.20190603.082632.59714.0001.dmp
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x00007fff97448286 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff952699f9 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x0000000006134ee3 libj9prt29.dylib`omrdump_create + 275
frame #3: 0x0000000006180a79 libj9dmp29.dylib`doSystemDump + 297
frame #4: 0x00000000061855d5 libj9dmp29.dylib`protectedDumpFunction + 21
frame #5: 0x0000000006136267 libj9prt29.dylib`omrsig_protect + 983
frame #6: 0x00000000061851d6 libj9dmp29.dylib`runDumpAgent + 790
frame #7: 0x000000000619b0d1 libj9dmp29.dylib`triggerDumpAgents + 1393
frame #8: 0x0000000006d412d9 libj9trc29.dylib`triggerHit + 761
frame #9: 0x0000000006d356fa libj9trc29.dylib`doTracePoint + 3626
frame #10: 0x0000000006d314eb libj9trc29.dylib`javaTrace + 203
frame #11: 0x0000000006133d84 libj9prt29.dylib`unwrapBlockAndCheckTags + 292
frame #12: 0x0000000006133c15 libj9prt29.dylib`omrmem_free_memory + 85
frame #13: 0x0000000006dd8d54 libj9jit29.dylib`std::__1::deque<TR::reference_wrapper<J9MemorySegment>, TR::typed_allocator<TR::reference_wrapper<J9MemorySegment>, J9::RawAllocator> >::erase(std::__1::__deque_iterator<TR::reference_wrapper<J9MemorySegment>, TR::reference_wrapper<J9MemorySegment> const*, TR::reference_wrapper<J9MemorySegment> const&, TR::reference_wrapper<J9MemorySegment> const* const*, long, 512l>) + 1076
frame #14: 0x0000000006dd802e libj9jit29.dylib`J9::SystemSegmentProvider::~SystemSegmentProvider() + 190
frame #15: 0x0000000006db0869 libj9jit29.dylib`TR::CompilationInfoPerThread::shouldPerformCompilation(TR_MethodToBeCompiled&) + 1593
This might be the same issue reported at #5399, particularly #5399 (comment).
In the cores from the 0.14.0
grinder, there are a couple of occurrences of #5399.
One of the issue in the comment above.
And another with a crash in segmentInsertionComparator
:
frame #13: 0x00007fff9b7adf1a libsystem_platform.dylib`_sigtramp + 26
frame #14: 0x000000000ced0a45 libj9vm29.dylib`segmentInsertionComparator + 5
frame #15: 0x000000000cfe65f2 libj9vm29.dylib`insertNode + 194
frame #16: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
frame #17: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
frame #18: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
frame #19: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
frame #20: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
frame #21: 0x000000000cfe651d libj9vm29.dylib`avl_insert + 29
frame #22: 0x000000000ced088f libj9vm29.dylib`allocateVirtualMemorySegmentInListInternal + 559
frame #23: 0x000000000cecffe3 libj9vm29.dylib`allocateMemorySegmentInList + 147
frame #24: 0x000000000ce91f7d libj9vm29.dylib`allocateClassMemorySegment + 205
frame #25: 0x000000000ce9b752 libj9vm29.dylib`internalCreateRAMClassFromROMClassImpl(J9VMThread*, J9ClassLoader*, J9ROMClass*, unsigned long, J9Class*, J9ROMMethod**, long, int, J9Class*, unsigned long, J9Class*, J9CreateRAMClassState*, J9ClassLoader*, J9Class*, J9Module*) + 9298
frame #26: 0x000000000ce985e9 libj9vm29.dylib`internalCreateRAMClassFromROMClass + 2825
frame #27: 0x000000000cfd2733 libj9vm29.dylib`internalDefineClass + 2883
frame #28: 0x000000000ec8f879 libjclse29.dylib`defineClassCommon + 1897
frame #29: 0x000000000ec823bd libjclse29.dylib`Java_java_lang_ClassLoader_defineClassImpl + 173
frame #30: 0x00000000236d4653
frame #31: 0x000000000ce8c48d libj9vm29.dylib`sendLoadClass + 429
frame #32: 0x000000000ce936b6 libj9vm29.dylib`internalFindClassInModule + 5126
frame #33: 0x000000000cecb0ca libj9vm29.dylib`resolveClassRef + 506
frame #34: 0x000000000ce305a7 libj9vm29.dylib`VM_BytecodeInterpreter::run(J9VMThread*) + 131815
frame #35: 0x000000000ce102b7 libj9vm29.dylib`bytecodeLoop + 151
frame #36: 0x000000000cf70052 libj9vm29.dylib`cInterpreter + 22
frame #37: 0x000000000ce8fe92 libj9vm29.dylib`sidecarInvokeReflectMethodImpl + 1474
frame #38: 0x000000000ce90711 libj9vm29.dylib`sidecarInvokeReflectMethod + 129
frame #39: 0x000000000ecbfebd libjclse29.dylib`JVM_InvokeMethod_Impl + 109
frame #40: 0x00000000236c9c45
frame #41: 0x000000000ce8fe92 libj9vm29.dylib`sidecarInvokeReflectMethodImpl + 1474
frame #42: 0x000000000ce90711 libj9vm29.dylib`sidecarInvokeReflectMethod + 129
frame #43: 0x000000000ecbfebd libjclse29.dylib`JVM_InvokeMethod_Impl + 109
frame #44: 0x00000000236c9c45
frame #45: 0x000000000ce8d47f libj9vm29.dylib`runJavaThread + 415
frame #46: 0x000000000cf66367 libj9vm29.dylib`javaProtectedThreadProc + 183
frame #47: 0x000000000c538267 libj9prt29.dylib`omrsig_protect + 983
frame #48: 0x000000000cf66283 libj9vm29.dylib`javaThreadProc + 83
frame #49: 0x000000000c4f80b9 libj9thr29.dylib`thread_wrapper + 361
frame #50: 0x00007fff9526805a libsystem_pthread.dylib`_pthread_body + 131
frame #51: 0x00007fff95267fd7 libsystem_pthread.dylib`_pthread_start + 176
frame #52: 0x00007fff952653ed libsystem_pthread.dylib`thread_start + 13
From the javacore: 1XHEXCPCODE InaccessibleAddress: 00007F18590ADB4C
(lldb) disassemble -n segmentInsertionComparator
libj9vm29.dylib`segmentInsertionComparator:
0xced0a40 <+0>: pushq %rbp
0xced0a41 <+1>: movq %rsp, %rbp
0xced0a44 <+4>: movq 0x20(%rdx), %rcx
0xced0a48 <+8>: xorl %eax, %eax
0xced0a4a <+10>: cmpq %rcx, 0x20(%rsi)
0xced0a4e <+14>: je 0xced0a60 ; <+32>
0xced0a50 <+16>: movl $0x1, %ecx
0xced0a55 <+21>: movq $-0x1, %rax
0xced0a5c <+28>: cmovaq %rcx, %rax
0xced0a60 <+32>: popq %rbp
0xced0a61 <+33>: retq
0xced0a62 <+34>: nopw %cs:(%rax,%rax)
Registers
(lldb) register read rdx rcx
rdx = 0x00007f18590adb2c
rcx = 0x00007fa05981a268
so rdx is the bad value.
Likely the walkNode
.
Need to get this into ddr and see if the avl tree is corrupt
I've opened a PR at OMR to make it easier to reason about the AVL code (https://github.com/eclipse/omr/pull/3944) and while it makes it easier to read the code, it won't fix this issue.
It doesn't look like we'll have a solution ready for this in the 0.15 release.
@theresa-m Can you update this with how the investigation is going?
The investigation is ongoing. Still don't have a fix in mind.
Hey @dmitripivkine ran into this going through one of the possible traces for this issue. This slot is being treated as J9Object in the code but clearly contains UTF8 string data. Is that normal behavior?
(lldb) up
frame #16: 0x0000000005a89d24 libj9gc29.dylib`MM_MarkingSchemeRootClearer::doStringTableSlot(this=0x000070000cc58c68, slotPtr=0x6176616a4c280028, stringTableIterator=0x000070000cc58b28) at MarkingSchemeRootClearer.cpp:353
350 MM_MarkingSchemeRootClearer::doStringTableSlot(omrobjectptr_t *slotPtr, GC_StringTableIterator *stringTableIterator)
351 {
352 _env->getGCEnvironment()->_markJavaStats._stringConstantsCandidates += 1;
-> 353 if(!_markingScheme->isMarked(*slotPtr)) {
354 _env->getGCEnvironment()->_markJavaStats._stringConstantsCleared += 1;
355 stringTableIterator->removeSlot();
356 }
(lldb) p *(GC_StringTableIterator*)0x000070000cc58b28
(GC_StringTableIterator) $0 = {
_hashTable = 0x00007fc5a34194f0
_handle = {
table = 0x00007fc5a34194f0
bucketIndex = 432
didDeleteCurrentNode = 0
pointerToCurrentNode = 0x00007fc5a5072388
iterateState = 0
poolState = {
leftToDo = 0
thePool = 0x0000000000000000
currentPuddle = 0x0000000000000000
lastSlot = 0
flags = 0
}
}
_firstIteration = false
}
(lldb) x/10s 0x00007fc5a5072388
0x7fc5a5072388: "("
0x7fc5a507238a: "(Ljava/util/concurrent/CountDownLatch;)V\b"
0x7fc5a50723b4: "finalize\x03"
0x7fc5a50723be: "()V"
0x7fc5a50723c2: "\r"
0x7fc5a50723c4: "WhiteBox.java"
0x7fc5a50723d2: "\n"
0x7fc5a50723d4: "WhiteBox$1\x10"
0x7fc5a50723e0: "java/lang/Object\b"
0x7fc5a50723f2: "WhiteBox\v"
(lldb) up
libj9gc29.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #17: 0x000000000594b873 libj9gc29.dylib`MM_RootScanner::scanStringTable(this=0x000070000cc58c68, env=0x00007fc5a4048e68) at RootScanner.cpp:621 [opt]
618 GC_StringTableIterator stringTableIterator(stringTable->getTable(tableIndex));
619 J9Object **slot = NULL;
620 while (NULL != (slot = (J9Object **)stringTableIterator.nextSlot())) {
-> 621 doStringTableSlot(slot, &stringTableIterator);
622 }
623 }
624 }
(lldb) p **(J9Object **)0x00007fc5a5072388
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb)
slotPtr
is pointer to slot where pointer to J9Object is located (J9Object** slotPtr
). An address of J9Object
should be in the object heap range. Obviously it should not point to UTF8 data.
In general there are two possible locations of the objects - object heap (heap allocated) or java stack of thread (stack allocated). String Table should not contain stack allocated objects.
If you check caller of MM_MarkingSchemeRootClearer::doStringTableSlot()
in MM_RootScanner::scanStringTable
:
GC_StringTableIterator stringTableIterator(stringTable->getTable(tableIndex));
J9Object **slot = NULL;
while (NULL != (slot = (J9Object **)stringTableIterator.nextSlot())) {
doStringTableSlot(slot, &stringTableIterator);
}
GC_StringTableIterator
is alias name of GC_HashTableIterator
, it is standard Hash Table Iteration:
void **
GC_HashTableIterator::nextSlot()
{
void **value;
if (_firstIteration) {
_firstIteration = false;
value = (void **)hashTableStartDo(_hashTable, &_handle);
} else {
value = (void **)hashTableNextDo(&_handle);
}
return value;
}
I guess there is a corruption in the hash table.
Another possibility that String Table contain stall pointer to collected object. In this case an address of the object must be in object heap range (can be checked in DDR !dumpallregions
or in javacore)
Please let me know if you need help, I can take a look to system core.
@theresa-m kindly generated a few system cores using latest JVM, so DDR for Mac now works.
I looked to two cores (crashes in GC and JIT). Long story short every time crash occur because one of j9pool's paddles contain garbage. So obviously there are two possibilities: bad paddle address was added to paddle list or memory itself where paddle header use to be located has been corrupted. Looks like it is a case of memory corruption. The memory located right before expected paddle has footer !J9MemTag corrupted. It is in both cases memory allocated from segment.c:238
size 0x4784 (18308). So I believe there is a memory segment was overwritten by some reason. I will continue investigation and try to figure out who own such memory segment.
@theresa-m had mentioned that backing out this PR prevents the failures: https://github.com/eclipse/openj9/pull/5301 but never of us have found the smoking gun on why this triggers the problem
The problematic segment written out-of-border is a memory allocated for ROM class for WhiteBox
class
There is an example of debugging for one of cores (ask me for details if it is not there):
I am going to investigate post-crash system core file using jdmpview
- powerful tool for OpenJ9 system core dump analysis.
The core I selected for example is core.20190826.111238.38364.0001.dmp
run it on Linux x86 machine:
../jdk-12.0.2+10/bin/jdmpview -core core.20190826.111238.38364.0001.dmp
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...
For a list of commands, type "help"; for how to use "help", type "help help"
Available contexts (* = currently selected context) :
Source : file:///team/Dmitri/temp/temp0/core.20190826.111238.38364.0001.dmp
*0 : PID: 0 : JRE 11 Mac OS X amd64-64 (build 11.0.4+11-201908171833)
>
As far as it is crash we should see crash record:
> !gpinfo
Failing Thread: !j9vmthread 0x11517100
Failing Thread ID: 0x37ab85 (3648389)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000000
Handler1=000000000E2229B0 Handler2=000000000E533D90
RDI=000070000E4CCC48 RSI=29280003657A696C RAX=00007F93478053D0 RBX=000070000E4CCB18
RCX=0000000010ED1098 RDX=000070000E4CCB18 R8=0000F8FA2D4B6109 R9=00007F9347803A80
R10=00007F93460054A8 R11=0000000000000000 R12=00007F9348012C20 R13=0000000000000006
R14=0000000010ED0FE8 R15=000070000E4CCC48
RIP=0000000010E244F3 GS=0000 FS=0000 RSP=000070000E4CCAF0
RFlags=0000000000010206 CS=002B RBP=000070000E4CCAF0 ERR=1E41D00000000000
TRAPNO=000000000000000D CPU=D000000000000000 FAULTVADDR=000000001E41D000
XMM0 000000000000000d (f: 13.000000, d: 6.422853e-323)
XMM1 0000000000000012 (f: 18.000000, d: 8.893182e-323)
XMM2 000000000000000e (f: 14.000000, d: 6.916919e-323)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM5 bfb4d30c00000000 (f: 0.000000, d: -8.134532e-02)
XMM6 000fffffffffffff (f: 4294967296.000000, d: 2.225074e-308)
XMM7 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/Users/theresa.t.mammarella@ibm.com/local-test/adopt-local-tests/jdk-11.0.4+11/Contents/Home/lib/compressedrefs/libj9gc29.dylib
Module_base_address=0000000010CD3000 Symbol=_ZN27MM_MarkingSchemeRootClearer17doStringTableSlotEPP8J9ObjectP20GC_HashTableIterator
Symbol_address=0000000010E244E0
From here we can see not only registers but also crashing point in the code MM_MarkingSchemeRootClearer17doStringTableSlotEPP8J9ObjectP20GC_HashTableIterator
So it crashed in GC - Global Stop-The-World GC. The stack from native debugger confirms this:
frame #13: 0x00007fff691d0b5d libsystem_platform.dylib`_sigtramp + 29
frame #14: 0x0000000010e244f4 libj9gc29.dylib`MM_MarkingSchemeRootClearer::doStringTableSlot(J9Object**, GC_HashTableIterator*) + 20
frame #15: 0x0000000010ce7a12 libj9gc29.dylib`MM_RootScanner::scanStringTable(MM_EnvironmentBase*) + 610
frame #16: 0x0000000010ce8dfb libj9gc29.dylib`MM_RootScanner::scanClearable(MM_EnvironmentBase*) + 267
frame #17: 0x0000000010e2074b libj9gc29.dylib`MM_MarkingDelegate::workerCompleteGC(MM_EnvironmentBase*) + 315
frame #18: 0x0000000010dfb6bb libj9gc29.dylib`MM_ParallelMarkTask::run(MM_EnvironmentBase*) + 91
frame #19: 0x0000000010dfa3ac libj9gc29.dylib`MM_ParallelDispatcher::slaveEntryPoint(MM_EnvironmentBase*) + 124
frame #20: 0x0000000010dfa299 libj9gc29.dylib`dispatcher_thread_proc2(OMRPortLibrary*, void*) + 249
(The one of reasons I selected this core for debugging is all java threads stacks are walkable at safe point. Another reason I am a GC Team member so I more familiar with GC structures).
GC executes Clearable phase (remove references to dead objects from JVM structures. The crash occur an attempt to iterate through String Table, so looks like something is wrong here.
There are a few ways to verify String Table in jdmpview
. For example:
> !dumpstringtable
Problem running command:
U64 contains value larger than Integer.MAX_VALUE
> !gccheck stringtable
Starting GC Check
Checking STRING TABLE...Problem running command:
U64 contains value larger than Integer.MAX_VALUE
As you see both of attempts failed and we can reasonably guess the reason is the same as for crash - something in String Table infrastructure is broken.
There are two ways to investigate from this point: dig through String Table internals or use shortcut just run java code used in jdmpview
in java debugger (we run it in Eclipse) and see where command failed. However I would show you String Table internals. The link to String Table can be found in GCExtensions:
> !findvm
!j9javavm 0x00007F934600E620
> !j9javavm 0x00007F934600E620 | grep -i gcext
0x22e8: void* gcExtensions = !j9x 0x00007F9348001220
> !mm_gcextensions 0x00007F9348001220 | grep -i stringtable
0x5c50: class MM_StringTable* stringTable = !mm_stringtable 0x00007F9348012C20
0x5f58: U32 _stringTableListToTreeThreshold = 0x00000400 (1024)
> !mm_stringtable 0x00007F9348012C20
MM_StringTable at 0x7f9348012c20 {
Fields for MM_Base:
Fields for MM_BaseVirtual:
0x0: void** _vptr$MM_BaseVirtual = !j9x 0x0000000010EC9D98
0x8: const U8* _typeId = !j9x 0x0000000010E8ECD7
Fields for MM_StringTable:
0x10: UDATA _tableCount = 0x0000000000000008 (8)
0x18: void** _table = !j9x 0x00007F9347806490
0x20: struct J9ThreadMonitor** _mutex = !j9x 0x00007F9347806510
0x28: struct J9Object*[] _cache = !j9x 0x00007F9348012C48
}
We are on the right way, you can find an address of String Table in R12 in crash info.
String Table is combined from number of hash tables. _table
is an address of hash tables array and size of it is _tableCount
so it is 8:
> !j9x 0x00007F9347806490,40
0x7F9347806490 : 00007f9347806590 00007f93478069b0 [ .e.G.....i.G.... ]
0x7F93478064A0 : 00007f9347806dd0 00007f93478071f0 [ .m.G.....q.G.... ]
0x7F93478064B0 : 00007f9347807610 00007f9347807a30 [ .v.G....0z.G.... ]
0x7F93478064C0 : 00007f9347807e50 00007f9347808270 [ P~.G....p..G.... ]
We need to inspect correctness of each of this hash table but from our shortcut we know that problematic hash table is 0x00007f9347807e50
:
> !J9HashTable 0x00007f9347807e50
J9HashTable at 0x7f9347807e50 {
Fields for J9HashTable:
0x0: const U8* tableName = !j9x 0x0000000010E8ECB0
0x8: U32 tableSize = 0x0000032B (811)
0xc: U32 numberOfNodes = 0x0000025C (604)
0x10: U32 numberOfTreeNodes = 0x00000000 (0)
0x14: U32 entrySize = 0x00000008 (8)
0x18: U32 listNodeSize = 0x00000010 (16)
0x1c: U32 treeNodeSize = 0x00000018 (24)
0x20: U32 nodeAlignment = 0x00000008 (8)
0x24: U32 flags = 0x00000002 (2)
0x28: U32 memoryCategory = 0x80000003 (2147483651)
0x2c: U32 listToTreeThreshold = 0x00000400 (1024)
0x30: void** nodes = !j9x 0x00007F9346069820
0x38: struct J9Pool* listNodePool = !j9pool 0x00007F9347807F20
0x40: struct J9Pool* treeNodePool = !j9pool 0x00007F9347808180
0x48: struct J9Pool* treePool = !j9pool 0x00007F9347808010
0x50: struct J9AVLTree* avlTreeTemplate = !j9avltree 0x00007F9347808100
0x58: void* hashFn = !j9x 0x0000000010CE9C60
0x60: void* hashEqualFn = !j9x 0x0000000010DC68D0
0x68: void* printFn = !j9x 0x0000000000000000
0x70: struct OMRPortLibrary* portLibrary = !omrportlibrary 0x000000000E154BE8
0x78: void* equalFnUserData = !j9x 0x00007F9347808100
0x80: void* hashFnUserData = !j9x 0x00007F934600E620
0x88: struct J9HashTable* previous = !j9hashtable 0x0000000000000000
}
Hash table can be converted to AVL tree bot not in this case (numberOfTreeNodes = 0x00000000
)
So we should look to listNodePool
:
> !j9pool 0x00007F9347807F20
J9Pool at 0x7f9347807f20 {
Fields for J9Pool:
0x0: U64 elementSize = 0x0000000000000010 (16)
0x8: U64 elementsPerPuddle = 0x0000000000000090 (144)
0x10: U64 puddleAllocSize = 0x0000000000000A50 (2640)
0x18: J9WSRP(J9PoolPuddleList) puddleList = !j9poolpuddlelist 0x00007F9347807FB0
0x20: void* memAlloc = !j9x 0x0000000010CD47A0
0x28: void* memFree = !j9x 0x0000000010CD47B0
0x30: const U8* poolCreatorCallsite = !j9x 0x0000000010E8ECB0
0x38: void* userData = !j9x 0x000000000E154BE8
0x40: U16 alignment = 0x0008 (8)
0x42: U16 flags = 0x002A (42)
0x44: U32 memoryCategory = 0x80000003 (2147483651)
}
The easiest way to inspect J9Pool is just walk it:
> !walkj9pool 0x00007F9347807F20
J9Pool at 0x7f9347807f20
{
Problem running command:
U64 contains value larger than Integer.MAX_VALUE
Oops... looking to pool puddle list:
> !j9poolpuddlelist 0x00007F9347807FB0
J9PoolPuddleList at 0x7f9347807fb0 {
Fields for J9PoolPuddleList:
0x0: U64 numElements = 0x000000000000025C (604)
0x8: J9WSRP(J9PoolPuddle) nextPuddle = !j9poolpuddle 0x00007F9345846420
0x10: J9WSRP(J9PoolPuddle) nextAvailablePuddle = !j9poolpuddle 0x00007F9345846420
}
Trying to get first puddle:
> !j9poolpuddle 0x00007F9345846420
J9PoolPuddle at 0x7f9345846420 {
Fields for J9PoolPuddle:
0x0: U64 usedElements = 0x0000000100000110 (4294967568)
0x8: J9SRP firstElementAddress = !j9x 0x0000000000000000
0xc: J9SRP(UDATA) firstFreeSlot = <FAULT>
0x10: J9WSRP(J9PoolPuddle) prevPuddle = !j9poolpuddle 0x0000000000000000
0x18: J9WSRP(J9PoolPuddle) nextPuddle = !j9poolpuddle 0x000081B445846438
0x20: J9WSRP(J9PoolPuddle) prevAvailablePuddle = !j9poolpuddle 0x00007F934584645F
0x28: J9WSRP(J9PoolPuddle) nextAvailablePuddle = !j9poolpuddle 0x0000000000000000
0x30: U64 userData = 0x0000000000000000 (0)
0x38: U64 flags = 0x0000000000000000 (0)
}
This does not look correct. Looking to raw memory:
> !j9x 0x00007F9345846420,200
0x7F9345846420 : 0000000100000110 0000000000000000 [ ................ ]
0x7F9345846430 : 0000000000000000 0000022100000000 [ ............!... ]
0x7F9345846440 : 000000000000001f 0000000000000000 [ ................ ]
0x7F9345846450 : 0000000000000000 0000000000000000 [ ................ ]
0x7F9345846460 : 0000004800000004 0000004800000005 [ ....H.......H... ]
0x7F9345846470 : 0000004800000007 000000020000014a [ ....H...J....... ]
0x7F9345846480 : 000000020000014e 0000000200000158 [ N.......X....... ]
0x7F9345846490 : 0000000200000168 000000ac000000a0 [ h............... ]
0x7F93458464A0 : 0000014600021010 0000008c00000104 [ ....F........... ]
0x7F93458464B0 : 000000bc00000098 00000162000000f4 [ ............b... ]
0x7F93458464C0 : 000000ac000000ec 01040000000000b0 [ ................ ]
0x7F93458464D0 : 00000200000a0002 02b72a0001b52b2a [ ........*+...*.. ]
0x7F93458464E0 : 000000110000e400 0000000000020002 [ ................ ]
0x7F93458464F0 : 000000ae0000d280 00050004000000b4 [ ................ ]
0x7F9345846500 : 0000010000080001 ac0003b60001b4d7 [ ................ ]
0x7F9345846510 : 0002000200000011 0000d28000000000 [ ................ ]
0x7F9345846520 : 0101010109090700 ffffff7800000006 [ ............x... ]
0x7F9345846530 : fffffff400000082 6966246c6176000d [ ..........val$fi ]
0x7F9345846540 : 0064657a696c616e 2f6176616a4c0025 [ nalized.%.Ljava/ ]
0x7F9345846550 : 6e6f632f6c697475 2f746e6572727563 [ util/concurrent/ ]
0x7F9345846560 : 776f44746e756f43 003b686374614c6e [ CountDownLatch;. ]
0x7F9345846570 : 3e74696e693c0006 6176616a4c280028 [ ..<init>(.(Ljava ]
0x7F9345846580 : 6f632f6c6974752f 746e65727275636e [ /util/concurrent ]
0x7F9345846590 : 6f44746e756f432f 3b686374614c6e77 [ /CountDownLatch; ]
0x7F93458465A0 : 616e696600085629 29280003657a696c [ )V..finalize..() ] <-- it is in RSI register
0x7F93458465B0 : 74696857000d0056 76616a2e786f4265 [ V...WhiteBox.jav ]
0x7F93458465C0 : 74696857000a0061 00103124786f4265 [ a...WhiteBox$1.. ]
0x7F93458465D0 : 6e616c2f6176616a 7463656a624f2f67 [ java/lang/Object ]
0x7F93458465E0 : 4265746968570008 63726f66000b786f [ ..WhiteBox..forc ]
0x7F93458465F0 : 0063476c6c754665 752f6176616a0023 [ eFullGc.#.java/u ]
0x7F9345846600 : 636e6f632f6c6974 432f746e65727275 [ til/concurrent/C ]
0x7F9345846610 : 6e776f44746e756f 000900686374614c [ ountDownLatch... ]
This is not a valid pool puddle. And obviously original crash occur here: we can see value we crashed at RSI=29280003657A696C
in 0x7F93458465A8
. A garbage in puddle memory was used for calculation and this bogus value has been loaded.
There are two possibilities at this point: 1) Puddle address in puddle list has been written wrong or corrupted 2) Memory itself where puddle use to be is corrupted
There is a strong evidence that we have second case. OpenJ9 uses header and footer tags for allocated memory. Header has eye-catcher 0xb1234567
and footer has eye-catcher 0xb7654321
We can try to find them:
> !findheader 0x7F9345846420
Searching memory allocation header for 0x00007F9345846420
J9MemTag check failed at 7f93458463a8: Corrupt memory section found :Wrong eyecatcher. Expected 0xb7654321 but was 0x00007F93458463A8
This tells us that footer eye-catcher is expected to be at 0x7f93458463a8
but it is not there:
> !j9x 0x7f9345846380,100
0x7F9345846380 : 2f676e616c2f6176 432f656b6f766e69 [ va/lang/invoke/C ]
0x7F9345846390 : 3b657469536c6c61 0000000000000000 [ allSite;........ ]
0x7F93458463A0 : dddddddd00000000 0000000000000288 [ ................ ] <--- eye-catcher expected to be here
0x7F93458463B0 : 0000021a00000212 4a80400000000020 [ ........ ....@.J ]
0x7F93458463C0 : 000000e000000000 000000f800000002 [ ................ ]
0x7F93458463D0 : 000000c400000001 0000000000000000 [ ................ ]
0x7F93458463E0 : 0000000800000008 ffffffffffffffc0 [ ................ ]
0x7F93458463F0 : 0000000e00000288 0000000000000128 [ ........(....... ]
0x7F9345846400 : 0000000000000000 000001d40000009c [ ................ ]
0x7F9345846410 : 0000009000000000 0000004100000037 [ ........7...A... ]
0x7F9345846420 : 0000000100000110 0000000000000000 [ ................ ]<-- puddle use to be here
0x7F9345846430 : 0000000000000000 0000022100000000 [ ............!... ]
0x7F9345846440 : 000000000000001f 0000000000000000 [ ................ ]
0x7F9345846450 : 0000000000000000 0000000000000000 [ ................ ]
0x7F9345846460 : 0000004800000004 0000004800000005 [ ....H.......H... ]
0x7F9345846470 : 0000004800000007 000000020000014a [ ....H...J....... ]
The data written over looks "contiguous" so there is reasonable guess that chunk of memory allocated lower has been overwritten out-of-bound. Lets try to figure out what it is. We need to find header tag for this allocation. There it is:
0x7F9345841C00 : f485ae20b1234567 0000000000004784 [ gE#. ....G...... ]
0x7F9345841C10 : 000000000e37e7f0 000000000e1528a0 [ ..7......(...... ]
0x7F9345841C20 : 0000000000004788 00003a5200003b8c [ .G.......;..R:.. ]
> !J9MemTag 0x7F9345841C00 <--- must have footer at 0x00007F93458463A8
J9MemTag at 0x7f9345841c00 {
Fields for J9MemTag:
0x0: U32 eyeCatcher = 0xB1234567 (2971878759)
0x4: U32 sumCheck = 0xF485AE20 (4102401568)
0x8: U64 allocSize = 0x0000000000004784 (18308)
0x10: const U8* callSite = !j9x 0x000000000E37E7F0 <-- allocated at segment.c:238
0x18: struct OMRMemCategory* category = !omrmemcategory 0x000000000E1528A0 <-- (Classes)
}
Is there somebody around to keep reference to this memory?
> !findall u64 0x7F9345841C20
Scanning memory for 20 1c 84 45 93 7f 00 00 aligned to 8 starting from 0x0
Match found at 0x11623b08
Match found at 0x7f934882c8e0
Match found at 0x7f934882c8e8
No more matches
> !j9x 0x11623b00,20
0x11623B00 : 0000000099669966 00007f9345841c20 [ f.f..... ..E.... ]
0x11623B10 : 0000000011623de0 00000000020e0001 [ .=b............. ]
The 0x99669966
is an eye-catcher for j9class:
> !j9class 0x11623B00
J9Class at 0x11623b00 {
Fields for J9Class:
0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F9345841C20 <----- ROM class overwrite reserved space
0x10: void** superclasses = !j9x 0x0000000011623DE0
0x18: UDATA classDepthAndFlags = 0x00000000020E0001 (34471937)
0x20: U32 classDepthWithFlags = 0x00000000 (0)
0x24: U32 classFlags = 0x00000000 (0)
0x28: struct J9ClassLoader* classLoader = !j9classloader 0x00007F93480518F0
0x30: struct J9Object* classObject = !j9object 0x00000000E015ECE0 // java/lang/Class
0x38: volatile UDATA initializeStatus = 0x0000000000000001 (1)
0x40: struct J9Method* ramMethods = !j9method 0x0000000011623448 // WhiteBox.<init>()V
0x48: UDATA* ramStatics = !j9x 0x0000000000000000
0x50: struct J9Class* arrayClass = !j9class 0x0000000000000000
0x58: UDATA totalInstanceSize = 0x000000000000002C (44)
0x60: struct J9ITable* lastITable = !j9itable 0x000000000E36CD88
0x68: UDATA* instanceDescription = !j9x 0x0000000000000FFD
0x70: UDATA* instanceLeafDescription = !j9x 0x0000000000000001
0x78: UDATA instanceHotFieldDescription = 0x0000000000000000 (0)
0x80: UDATA selfReferencingField1 = 0x0000000000000000 (0)
0x88: UDATA selfReferencingField2 = 0x0000000000000000 (0)
0x90: struct J9Method* initializerCache = !j9method 0x0000000000000000
0x98: UDATA romableAotITable = 0x00000000103DA000 (272474112)
0xa0: UDATA packageID = 0x00007F93480518F0 (140270545213680)
0xa8: struct J9Module* module = !j9module 0x0000000000000000
0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000011611C00 // java/net/URLClassLoader$3
0xb8: struct J9Class* subclassTraversalReverseLink = !j9class 0x00000000115D1D00 // org/testng/ITestListener
0xc0: void** iTable = !j9x 0x0000000000000000
0xc8: UDATA castClassCache = 0x0000000011636401 (291726337)
0xd0: void** jniIDs = !j9x 0x00007F934789C130
0xd8: UDATA lockOffset = 0x0000000000000004 (4)
0xe0: UDATA newInstanceCount = 0x00000000000003E8 (1000)
0xe8: IDATA backfillOffset = 0x0000000000000030 (48)
0xf0: struct J9Class* replacedClass = !j9class 0x0000000000000000
0xf8: UDATA finalizeLinkOffset = 0x0000000000000000 (0)
0x100: struct J9Class* nextClassInSegment = !j9class 0x0000000000000000
0x108: UDATA* ramConstantPool = !j9x 0x0000000011622870
0x110: struct J9Object** callSites = !j9x 0x0000000011623DA0
0x118: struct J9Object** methodTypes = !j9x 0x0000000000000000
0x120: struct J9Object** varHandleMethodTypes = !j9x 0x0000000011623E10
0x128: struct J9VMCustomSpinOptions* customSpinOption = !j9vmcustomspinoptions 0x0000000000000000
0x130: void** staticSplitMethodTable = !j9x 0x0000000000000000
0x138: void** specialSplitMethodTable = !j9x 0x0000000000000000
0x140: struct J9JITExceptionTable* jitMetaDataList = !j9jitexceptiontable 0x0000000000000000
0x148: struct J9Class* gcLink = !j9class 0x0000000000000000
0x150: struct J9Class* hostClass = !j9class 0x0000000011623B00 // WhiteBox
0x158: struct J9Class* nestHost = !j9class 0x0000000000000000
0x160: struct J9FlattenedClassCache* flattenedClassCache = !j9flattenedclasscache 0x0000000000000000
}
Class name: WhiteBox
To view static fields, use !j9statics 0x0000000011623B00
To view instance shape, use !j9classshape 0x0000000011623B00
and problematic pointer is a pointer to ROM class.
So the question is what is special in WhiteBox
class that we failed to calculate expected ROM class size properly?
TODO: determine if this issue has previously shipped or not
Given this was discovered Jun 3, the problem is in the 0.15 release which occurred in July. Moving to the next milestone.
This should now be fixed, the test can be unexcluded.
I've opened this pr to reinclude the test https://github.com/AdoptOpenJDK/openjdk-tests/pull/1350
Added a note in PR 1350, is it also to be reincluded for jdk13? as its excluded in that version also, will merge PR once that is updated or clarified
Failure link
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder_Advanced/10/console
If the link is not public, instead include
Optional info
Failure output