Open gjdeval opened 1 year ago
While attempting to use AdminCenter this morning, I inadvertently recreated this problem, except this time with 6 DE threads ... see attached javacore. So the problem is not limited to the minimal config with 4 DE threads.
In this case, one of the hung threads is in a different callstack:
3XMTHREADBLOCK Blocked on: com/ibm/ws/http/channel/h2internal/H2StreamProcessor@0x00000000E668C238 Owned by: "Default Executor-thread-4" (J9VMThread:0x00000000006F3B00, java/lang/Thread:0x00000000C0151698)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.flushDataWaitingForWindowUpdate(H2StreamProcessor.java:877)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.connectionWindowSizeUpdated(H2StreamProcessor.java:883)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2InboundLink.incrementConnectionWindowUpdateLimit(H2InboundLink.java:872)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processWindowUpdateFrame(H2StreamProcessor.java:801)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processNextFrame(H2StreamProcessor.java:480)
5XESTACKTRACE (entered lock: com/ibm/ws/http/channel/h2internal/H2StreamProcessor@0x00000000E15E46B0, entry count: 1)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/FrameReadProcessor.processCompleteFrame(FrameReadProcessor.java:125)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2InboundLink.processRead(H2InboundLink.java:678)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2InboundLink.processRead(H2InboundLink.java:607)
4XESTACKTRACE at com/ibm/ws/http/channel/h2internal/H2MuxTCPReadCallback.complete(H2MuxTCPReadCallback.java:42)
4XESTACKTRACE at com/ibm/ws/channel/ssl/internal/SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1826)
4XESTACKTRACE at com/ibm/ws/tcpchannel/internal/WorkQueueManager.requestComplete(WorkQueueManager.java:516)
4XESTACKTRACE at com/ibm/ws/tcpchannel/internal/WorkQueueManager.attemptIO(WorkQueueManager.java:586)
4XESTACKTRACE at com/ibm/ws/tcpchannel/internal/WorkQueueManager.workerRun(WorkQueueManager.java:970)
4XESTACKTRACE at com/ibm/ws/tcpchannel/internal/WorkQueueManager$Worker.run(WorkQueueManager.java:1059)
4XESTACKTRACE at com/ibm/ws/threading/internal/ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:247)
4XESTACKTRACE at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
4XESTACKTRACE at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
4XESTACKTRACE at java/lang/Thread.run(Thread.java:825)
As before, I've taken a core dump, let me know if I should upload that to assist with diagnosis.
issue-24167-hung-6-threads-javacore.20230203.091606.44149.0001.txt
We are noticing the same issue when sending 50+ HTTP/2 requests to our Open Liberty server version 23.0.0.12
Is there any resolution for this issue?
If the Default Executor (DE) pool size is small, all the DE threads may hang in an HTTP 2 write callstack, waiting for a CountDownLatch to release them. The result is that the server is hung and must be restarted for any work to resume.
The problem occurs in OL-23001-GA. I do not know if it occurs in prior releases.
The problem was observed when I had used the Admin Center to connect to the test server on the https port, and then stopped using the Admin Center but left the browser tab open. I removed the adminCenter-1.0 from the Liberty server and restarted the server. Apparently the browser tab attempted to continue to connect to the Admin Center even though I was not doing anything with the browser.
A tcpdump taken when the system is hung in this state shows ~170 messages per second from the browser to the https port on Liberty. The browser apparently defaults to using HTTP 2, since the callstacks of the hung DE threads are in
H2StreamProcessor
methods. Near the top of the callstack we findHttpDispatcherLink.send404Message
- probably Liberty is replying with a 404 because the Admin Center is not available.It seems that the high inbound message rate, down the execution path for HTTP 2 and SSL (https port), to a non-existent endpoint (send404Message), has exposed a timing window in the implementation. So far, I have only seen this happen with the DE pool pinned to 4 threads - presumably, with 5 or more threads, there is always a free thread to do the work that would release the other threads to continue execution.
The attached javacore shows the thread state when the hang has occurred. I have a heap dump and core dump available, in case those would be of use to diagnose the issue.
Here is a sample callstack from the javacore showing one of the hung DE threads.
Additional context
javacore.20230201.101635.15155.0001.txt
Add any other context about the problem here.