Closed mwangcaibmcom closed 1 year ago
@0xdaryl pls take a look.
I reached out via Slack to the originator to ask for the javacores.
Javacore is provided
There is nothing unusual about the number of JIT-QueueSlotMonitor's (1883) given that there are 7 compilation threads. These are used to manage the queuing of methods for JIT compilation and from the javacore JIT compilation doesn't appear to be a problem here.
Each javacore contains a backtrace of all thread stacks. @mwangcaibmcom , can you review the application thread backtraces to see if they make sense to you? Some threads are waiting on monitors (and the javacore will tell you which ones) so it might be helpful for you to review those stacks too in the context of your application to ensure those aren't problematic.
Did this problem appear because you recently moved up to a new JVM?
@0xdaryl thanks for your response.
Did this problem appear because you recently moved up to a new JVM?
We did upgrade Java from open JDK version 8 to IBM Semeru v11 half year ago. I did not notice the issue happened before upgarding.
I reviewed all application threads using thread pool(thread name starting with Default Executor-thread
), I only see one thread was waiting a lock as below, others were waiting for monitor in the executer pool or other feature
[2023-03-07T04:13:10.353Z] 2LKREGMON Thread public flags mutex lock (0x00007F28F001F798): <unowned>
[2023-03-07T04:13:10.353Z] 3LKNOTIFYQ Waiting to be notified:
[2023-03-07T04:13:10.354Z] 3LKWAITNOTIFY "Default Executor-thread-4" (J9VMThread:0x000000000071D800)
I reviewed all javacore dumps, only this one has lot of JIT-QueueSlotMonitor locks. All javacores have 8 compilation threads like below, only JIT Compilation Thread-000
without Suspended
follow up. However, only this javacores has thousand locks. It could be normal as you stated.
[2023-03-07T04:13:10.436Z] 3XMTHREADINFO "JIT Compilation Thread-006 Suspended" J9VMThread:0x000000000002F900, omrthread_t:0x00007F2998147D20, java/lang/Thread:0x0000000080491220, state:R, prio=10
[2023-03-07T04:13:10.436Z] 3XMJAVALTHREAD (java/lang/Thread getId:0x9, isDaemon:true)
[2023-03-07T04:13:10.436Z] 3XMJAVALTHRCCL jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000804663D8)
[2023-03-07T04:13:10.436Z] 3XMTHREADINFO1 (native thread ID:0xA8, native priority:0xB, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
[2023-03-07T04:13:10.436Z] 3XMTHREADINFO2 (native stack address range from:0x00007F297C82E000, to:0x00007F297C92E000, size:0x100000)
[2023-03-07T04:13:10.436Z] 3XMCPUTIME CPU usage total: 0.001356501 secs, current category="JIT"
[2023-03-07T04:13:10.436Z] 3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
[2023-03-07T04:13:10.436Z] 3XMTHREADINFO3 No Java callstack associated with this thread
[2023-03-07T04:13:10.436Z] 3XMTHREADINFO3 Native callstack:
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C393002 [libj9prt29.so+0x27002])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C397863 [libj9prt29.so+0x2b863])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C3934FA [libj9prt29.so+0x274fa])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C397863 [libj9prt29.so+0x2b863])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C392E94 [libj9prt29.so+0x26e94])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299C394573 [libj9prt29.so+0x28573])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299E38FCF0 [libpthread.so.0+0x12cf0])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK pthread_cond_wait+0x1fa (0x00007F299E38B45A [libpthread.so.0+0xe45a])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK omrthread_monitor_wait+0x287 (0x00007F299C1599C7 [libj9thr29.so+0x89c7])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299682B05E [libj9jit29.so+0x13c05e])
[2023-03-07T04:13:10.436Z] 4XENATIVESTACK (0x00007F299683C258 [libj9jit29.so+0x14d258])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK (0x00007F299683C2F2 [libj9jit29.so+0x14d2f2])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK (0x00007F299C397863 [libj9prt29.so+0x2b863])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK (0x00007F299683C722 [libj9jit29.so+0x14d722])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK (0x00007F299C15F4F6 [libj9thr29.so+0xe4f6])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK (0x00007F299E3851CA [libpthread.so.0+0x81ca])
[2023-03-07T04:13:10.437Z] 4XENATIVESTACK clone+0x43 (0x00007F299DBDBE73 [libc.so.6+0x39e73])
FYI, I also opened an issue for Liberty Server team for a deadlock(https://github.com/OpenLiberty/open-liberty/issues/24651). I found 2 technotes, which might not be related for this issue(FYI) https://www.ibm.com/support/pages/apar/IJ07063 https://www.ibm.com/support/pages/apar/IV82849 I tried workaround in first technote 5 days ago, we have not found timeout issue after that, this meant server has not been hung after applying the workaround.
@0xdaryl any update?
A few questions:
The four javacore files were taken within 6 seconds of each other. Is it possible to spread this duration out a bit (e.g., 10 seconds apart) to get a longer perspective on the hang?
What is the frequency of the hang? Does it hang every time? If not, how often?
Does this problem appear with the workaround for the Liberty deadlock in (24651)?
When you say you applied the workaround from IJ07063 and did not see a problem can you confirm that the workaround you applied was -Xjit:disableLockReservation
?
Also, a core file (system dump) along with a matching javacore would also be useful. Can you generate a system dump + javacore? You may need to use com.ibm.jvm.Dump.systemDumpToFile()
if you were generating these dumps programmatically.
I have not seen the issue happened after applying the workaround of -Xjit:disableLockReservation
. The Liberty deadloack issue has been fixed in version 23.0.0.3. we plan to upgrade to the new version. After that, I will remove the workaround and monitor if the server hang happens.
As we plan to upgrade to new version, I will create a new issue if server hang happens with new version of Liberty server. I will be closing the issue now.
Java -version output
IBM Semeru Runtime Certified Edition 11.0.17.0 (build 11.0.17+8)
Summary of problem
Liberty server hang randomly in a container running on k8s/openshift cluster in an integration pipeline. Rest API calls went into the pod and the pod can process the requests, but there were no response at client side. After 2 minutes, API calls get timeout 504 error and the pod would be restarts after liveness failures(timeout) reached the threshold(10 times)
Diagnostic files
javacore dump files were generated triggered by user to diagnose the issue. We found that there are thousands JIT locks in javacore as below, please reach out to me via slack channel if you need javacore dump file.