eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

ASSERTION FAILED at \omr\port\common\omrmemtag.c:145: ((memoryCorruptionDetected)) #18025

Open Herr-Sepp opened 1 year ago

Herr-Sepp commented 1 year ago

If i try to start our jenkins with openj9 it will crash on the first request to the web GUI.

Jenkins: 2.414.1 LTS OS. Windows Server 2019 openj9: ibm-semeru-open-jdk_x64_windows_17.0.8_7_openj9-0.40.0

20:00:55.899 0x1b6e00 omrport.359 * ASSERTION FAILED at c:\workspace\openjdk-build\workspace\build\src\omr\port\common\omrmemtag.c:145: ((memoryCorruptionDetected))

Javadump: javacore.20230827.183621.9736.0002.txt

I could also provied a snap.trc or a full heap.dump if it helps, but I need a way not to hand it over publicly.

Our Jenkins run fine with OpenJDK17U-jdk_x64_windows_hotspot_17.0.8_7

Herr-Sepp commented 1 year ago

Maybe a duplicate of https://github.com/eclipse-openj9/openj9/issues/17247 but the workaround -XX:-ShowCodeDetailsInExceptionMessages did not work for me.

pshipton commented 1 year ago

What we need to further diagnose this is the system core file. You can use any file sharing service, and password protect it if necessary.

gacholio commented 1 year ago
XMCURTHDINFO  Current thread
3XMTHREADINFO      "Finalizer thread" J9VMThread:0x00000000001B6E00, omrthread_t:0x000001BEC1504848, java/lang/Thread:0x00000000C0136138, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x24, isDaemon:true)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000C0077A28)
3XMTHREADINFO1            (native thread ID:0xAC0, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMCPUTIME               CPU usage total: 0.640625000 secs, user: 0.421875000 secs, system: 0.218750000 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at jdk/internal/misc/Unsafe.freeDBBMemory(Native Method)
4XESTACKTRACE                at java/nio/DirectByteBuffer$Deallocator.run(DirectByteBuffer.java:97)
4XESTACKTRACE                at jdk/internal/ref/Cleaner.clean(Cleaner.java:144)
4XESTACKTRACE                at java/lang/ref/ReferenceQueue.enqueue(ReferenceQueue.java:167(Compiled Code))
4XESTACKTRACE                at java/lang/ref/Reference.enqueueImpl(Reference.java:160(Compiled Code))
5XESTACKTRACE                   (entered lock: jdk/internal/ref/Cleaner@0x00000000FDFDC320, entry count: 1)

Which means a core will likely not provide any useful information, as the DBB memory has no tracking of who caused the allocation. It might be possible to determine what the buffer was used for by examining the contents, but it's pretty unlikely.

Herr-Sepp commented 1 year ago

@pshipton I have sent the download link for the files to your email address on your github profile. Hope this is ok.

pshipton commented 1 year ago

Looking at the first core, the corruption is to a DBB, the header looks fine, the footer has the checksum and eyecatcher overwritten with zero, which caused the assertion. The DBB size is 0x20, the contents are some pointers to other DBB. The first DBB pointer is an allocation of size 0x18, second DBB size 0x20, third is null, fourth is a pointer to string data "Active Directory Provider". There are other strings in the vicinity "C:\Windows\system32\activeds.dll", "domainControllerFunctionality".

core.20230827.183621.9736.0001.dmp 0x0: void* corruptedMemoryBlock = !j9x 0x000001BEC6628530

0x1BEC6628510 :  7d6f3691 b1234567 0000000000000020 [ gE#..6o} ....... ]
0x1BEC6628520 :  00007ffc77371960 00007ffc7d19ee18 [ `.7w.......}.... ]

0x1BEC6628530 :  000001bea4c2b200 000001bec63ed820 [ ........ .>..... ]
0x1BEC6628540 :  0000000000000000 000001bec66aaf48 [ ........H.j..... ]

0x1BEC6628550 :  0000000000000000 0000000000000020 [ ........ ....... ]
0x1BEC6628560 :  00007ffc77371960 00007ffc7d19ee18 [ `.7w.......}.... ]

0x7FFC77371960 :  736b726f775c3a63 65706f5c65636170 [ c:\workspace\ope ]
0x7FFC77371970 :  6975622d6b646a6e 736b726f775c646c [ njdk-build\works ]
0x7FFC77371980 :  6975625c65636170 6f5c6372735c646c [ pace\build\src\o ]
0x7FFC77371990 :  75725c396a6e6570 636a5c656d69746e [ penj9\runtime\jc ]
0x7FFC773719A0 :  6e6f6d6d6f635c6c 5f656661736e755c [ l\common\unsafe_ ]
0x7FFC773719B0 :  34323a632e6d656d 0000000600000031 [ mem.c:241....... ]

!omrmemcategory 0x7ffc7d19ee18
OMRMemCategory at 0x7ffc7d19ee18 {
  Fields for OMRMemCategory:
        0x0: const U8* name = !j9x 0x00007FFC7D19EE48 // "Direct Byte Buffers"

#define J9MEMTAG_EYECATCHER_ALLOC_HEADER            0xB1234567
#define J9MEMTAG_EYECATCHER_ALLOC_FOOTER            0xB7654321
#define J9MEMTAG_EYECATCHER_FREED_HEADER            0xBADBAD67
#define J9MEMTAG_EYECATCHER_FREED_FOOTER            0xBADBAD21

typedef struct J9MemTag {
    uint32_t eyeCatcher;
    uint32_t sumCheck;
    uintptr_t allocSize;
    const char *callSite;
    OMRMemCategory *category;
#if !defined(OMR_ENV_DATA64)
    /* omrmem_allocate_memory should return addresses aligned to 8 bytes for
     * performance reasons. On 32 bit platforms we have to pad to achieve this.
     */
    uint8_t padding[4];
#endif
} J9MemTag;
pshipton commented 1 year ago

core.20230828.204216.6336.0001.dmp 0x0: void* corruptedMemoryBlock = !j9x 0x0000015F9A00E900

Similar to the first core. The fourth pointer to the string has a different string, it starts with DC=. I'm not including the rest of the string in case it's sensitive info. You can see it in the core via !j9x 0x15f9862abf8,200. It is followed by " Provider" as if the memory used to contain "Active Directory Provider" but was overwritten.

pshipton commented 1 year ago

There isn't much more I can do. Something is overwriting a Direct Byte Buffer. I don't know what or why. This isn't recommended but -Xtrace:none=omrport.359 should disable the assertion, with side affects. The memory categorization counters won't be properly updated to reflect the free memory, so memory usage reports will be incorrect. If the only corruption is the first pointer of the footer, everything will run fine, but if there is anything worse, you may have random data corruption or crashes.