eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Memory Corruption shortly after starting jenkins app #13570

Open amesser opened 3 years ago

amesser commented 3 years ago

Java -version output

openjdk version "11.0.12" 2021-07-20 IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.12+7) Eclipse OpenJ9 VM 11.0.12.0 (build openj9-0.27.0, JRE 11 Windows 7 amd64-64-Bit Compressed References 20210730_175 (JIT enabled, AOT enabled) OpenJ9 - 1851b0074 OMR - 9db1c870d JCL - 21849e2ca0 based on jdk-11.0.12+7)

Summary of problem

We're trying to switch our existing jenkins instance to openj9. The app starts but as soon as someone signs into jenkins the jvm crashes.

Diagnostic files

Here is the stderr log:

06:30:03.765 0x24ec400 omrport.359 * ASSERTION FAILED at C:\workspace\openjdk-build\workspace\build\src\omr\port\common\omrmemtag.c:145: ((memoryCorruptionDetected)) JVMDUMP039I Speicherauszugsereignis "traceassert", Detail "" um 2021/09/27 08:30:03 - bitte warten. JVMDUMP032I JVM forderte als Antwort auf ein Ereignis einen Speicherauszug von System mit "c:\jenkins\core.20210927.083003.49288.0001.dmp" an JVMDUMP010I Speicherauszug von System in c:\jenkins\core.20210927.083003.49288.0001.dmp geschrieben JVMDUMP032I JVM forderte als Antwort auf ein Ereignis einen Speicherauszug von Java mit "c:\jenkins\javacore.20210927.083003.49288.0002.txt" an

Please find attached javacore. I have removed some environment vars related to our internals. javacore.20210927.083003.49288.0002 - Cleaned.txt

pshipton commented 3 years ago

We need the system core to investigate further, pls compress and share it using a service of your choice. If necessary, you can share it privately with me or any other active project committer or contributor.

pshipton commented 3 years ago

It may also be worth trying 11.0.13 milestone 1 to see if the problem still occurs. https://github.com/ibmruntimes/semeru11-binaries/releases/tag/jdk-11.0.13%2B05_openj9-0.29.0-m1

gacholio commented 3 years ago
1XMCURTHDINFO  Current thread
3XMTHREADINFO      "Finalizer thread" J9VMThread:0x00000000024EC400, omrthread_t:0x000000002BF83860, java/lang/Thread:0x000000070373C148, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x43, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0xA3F0, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMCPUTIME               CPU usage total: 0.514803300 secs, user: 0.234001500 secs, system: 0.280801800 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at jdk/internal/misc/Unsafe.freeDBBMemory(Native Method)
4XESTACKTRACE                at java/nio/DirectByteBuffer$Deallocator.run(Bytecode PC:17)
4XESTACKTRACE                at jdk/internal/ref/Cleaner.clean(Bytecode PC:14)
4XESTACKTRACE                at java/lang/ref/ReferenceQueue.enqueue(Bytecode PC:15(Compiled Code))
4XESTACKTRACE                at java/lang/ref/Reference.enqueueImpl(Bytecode PC:39(Compiled Code))
5XESTACKTRACE                   (entered lock: jdk/internal/ref/Cleaner@0x0000000709C79318, entry count: 1)
pshipton commented 3 years ago

If the corruption is occurring in a direct byte buffer, it may be an application problem, writing outside the bounds of the buffer.

amesser commented 3 years ago

You mean the Windows Memory Dump File? Its about 5 GB of size. Since it is company's jenkins I'll have to share it privately. I'll take a look at it tomorrow and drop you an e-mail with the download url.

amesser commented 3 years ago

Btw, the problem only occurs with our production Jenkins. I did some pre-testing before with a test installation and didn't see the issue there.

pshipton commented 3 years ago

Yes the Windows Memory Dump File. Although it's 5GB it should compress quite a bit.

amesser commented 3 years ago

@pshipton : I decided to use our sharepoint - i'm a first time user - hopefully you should have received an e-mail with link to the file.

pshipton commented 3 years ago

I got it. Below is the corrupted memory block, which looks fine to me to share in the issue. The footer of the block has been overwritten with zeros. It should contain a checksum and the value 0xb7654321, followed by the block size (0x20), similar to the header.

If you want to find this yourself in other core files, you can open the core using the bin/jdmpview utility, and then

Example, the addresses are from the core file provided:

> !findvm
!j9javavm 0x00000000003CD530

> !j9javavm 0x00000000003CD530 | grep j9portlibrary
    0x20: class J9PortLibrary* portLibrary = !j9portlibrary 0x000007FEEE087CF0

> !j9portlibrary 0x000007FEEE087CF0 | grep omrportlibrary
    0x0: class OMRPortLibrary omrPortLibrary = !omrportlibrary 0x000007FEEE087CF0

> !omrportlibrary 0x000007FEEE087CF0 | grep omrportlibraryglobaldata
    0x0: class OMRPortLibraryGlobalData* portGlobals = !omrportlibraryglobaldata 0x00000000003C33C0

> !omrportlibraryglobaldata 0x00000000003C33C0 | grep corruptedMemoryBlock
    0x0: void* corruptedMemoryBlock = !j9x 0x0000000040B699E0

> !j9x 0x0000000040B699C0,0x60 # I already know the length is 0x20 for the header, the memory block is 0x20 in this case, and 0x20 for the footer
header:
0x40B699C0 :  f6ac8c3fb1234567 0000000000000020 [ gE#.?... ....... ] 
0x40B699D0 :  000007fee93e9340 000007feee07c3f8 [ @.>............. ] 
data:
0x40B699E0 :  0000000035669b30 0000000035669a50 [ 0.f5....P.f5.... ] 
0x40B699F0 :  0000000000000000 0000000040b6a4b8 [ ...........@.... ] 
footer:
0x40B69A00 :  0000000000000000 0000000000000020 [ ........ ....... ] 
0x40B69A10 :  000007fee93e9340 000007feee07c3f8 [ @.>............. ] 

The memory was allocated by the following, which just means it is a direct byte buffer. Since it's being freed by the cleaner, no idea what allocated it or how it was overwritten.

0x7FEE93E9340 :  736b726f775c3a43 65706f5c65636170 [ C:\workspace\ope ] 
0x7FEE93E9350 :  6975622d6b646a6e 736b726f775c646c [ njdk-build\works ] 
0x7FEE93E9360 :  6975625c65636170 6f5c6372735c646c [ pace\build\src\o ] 
0x7FEE93E9370 :  75725c396a6e6570 636a5c656d69746e [ penj9\runtime\jc ] 
0x7FEE93E9380 :  6e6f6d6d6f635c6c 5f656661736e755c [ l\common\unsafe_ ] 
0x7FEE93E9390 :  34323a632e6d656d 0000000600000031 [ mem.c:241....... ] 

You could check if the version of jenkins being used has any known issues that would cause memory corruption.

amesser commented 3 years ago

Oh dear, I didn't know yet that it is possible for an java application to corrupt the memory. I was always told java doesn't allow you to access memory out of bounds :-) I'm not sure if the Jenkins guys can solve this. We use a lot of groovy script stuff to generalize things. Or maybe after restarting, Jenkins running on OpenJ9 tried to restore some previous state it saved while it was run under Hotspot? Currently we're back to Hotspot, hopefully we find the issue, we're looking forward to switch over to OpenJ9 for a while now.

pshipton commented 3 years ago

It depends what you are using in Java. Typically java can't corrupt the memory, but there are things you can use (Unsafe) that allow writing anywhere. Also if you have JNI native libraries they can also write anywhere.

I can tell you that the OpenJ9 project runs jenkins https://openj9-jenkins.osuosl.org

pshipton commented 3 years ago

Didn't mean to save that yet.

The OpenJ9 project runs jenkins without issue. It shows Jenkins 2.289.1 and we are using OpenJ9 (0.26) 11.0.11 to run it.

pshipton commented 3 years ago

It doesn't run on Windows though but plinux.

amesser commented 3 years ago

I was sucessfully running a test server with jenkins/openj9 on windows for some months (it is still using openj9 at the moment). However when we adjusted the production system, the crash occured. The only difference is, the test server has not as much projects configured.

pshipton commented 3 years ago

All I can suggest atm is adjusting the version of Jenkins or OpenJ9 to see if you can find a combination that works.