OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.14k stars 583 forks source link

Liberty threads may hang when https port is spammed with HTTP 2 #24167

Open gjdeval opened 1 year ago

gjdeval commented 1 year ago

If the Default Executor (DE) pool size is small, all the DE threads may hang in an HTTP 2 write callstack, waiting for a CountDownLatch to release them. The result is that the server is hung and must be restarted for any work to resume.

The problem occurs in OL-23001-GA. I do not know if it occurs in prior releases.

The problem was observed when I had used the Admin Center to connect to the test server on the https port, and then stopped using the Admin Center but left the browser tab open. I removed the adminCenter-1.0 from the Liberty server and restarted the server. Apparently the browser tab attempted to continue to connect to the Admin Center even though I was not doing anything with the browser.

A tcpdump taken when the system is hung in this state shows ~170 messages per second from the browser to the https port on Liberty. The browser apparently defaults to using HTTP 2, since the callstacks of the hung DE threads are in H2StreamProcessor methods. Near the top of the callstack we find HttpDispatcherLink.send404Message - probably Liberty is replying with a 404 because the Admin Center is not available.

It seems that the high inbound message rate, down the execution path for HTTP 2 and SSL (https port), to a non-existent endpoint (send404Message), has exposed a timing window in the implementation. So far, I have only seen this happen with the DE pool pinned to 4 threads - presumably, with 5 or more threads, there is always a free thread to do the work that would release the other threads to continue execution.

The attached javacore shows the thread state when the hang has occurred. I have a heap dump and core dump available, in case those would be of use to diagnose the issue.

Here is a sample callstack from the javacore showing one of the hung DE threads.

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at sun/misc/Unsafe.park(Native Method)
4XESTACKTRACE                at java/util/concurrent/locks/LockSupport.park(LockSupport.java:186)
4XESTACKTRACE                at java/util/concurrent/locks/AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:847)
4XESTACKTRACE                at java/util/concurrent/locks/AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1008)
4XESTACKTRACE                at java/util/concurrent/locks/AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1315)
4XESTACKTRACE                at java/util/concurrent/CountDownLatch.await(CountDownLatch.java:242)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2WriteQEntry.waitWriteCompleteLatch(H2WriteQEntry.java:78)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2InboundLink.writeSync(H2InboundLink.java:812)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.writeFrameSync(H2StreamProcessor.java:2008)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processHalfClosedRemote(H2StreamProcessor.java:1237)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.readWriteTransitionState(H2StreamProcessor.java:663)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.readWriteTransitionState(H2StreamProcessor.java:611)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processNextFrame(H2StreamProcessor.java:565)
5XESTACKTRACE                   (entered lock: com/ibm/ws/http/channel/h2internal/H2StreamProcessor@0x00000000E68829A8, entry count: 1)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2HttpInboundLinkWrap.writeFramesSync(H2HttpInboundLinkWrap.java:497)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/HttpServiceContextImpl.synchWrite(HttpServiceContextImpl.java:3174)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/HttpServiceContextImpl.sendFullOutgoing(HttpServiceContextImpl.java:2876)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/inbound/HttpInboundServiceContextImpl.finishResponseMessage(HttpInboundServiceContextImpl.java:939)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/outbound/HttpOutputStreamImpl.flushBuffers(HttpOutputStreamImpl.java:540)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/outbound/HttpOutputStreamImpl.close(HttpOutputStreamImpl.java:627)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.tryToCloseStream(HttpDispatcherLink.java:1285)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.closeStreams(HttpDispatcherLink.java:1199)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.finish(HttpDispatcherLink.java:1099)
5XESTACKTRACE                   (entered lock: java/lang/Object@0x00000000E69E3DB0, entry count: 1)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.sendResponse(HttpDispatcherLink.java:723)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.displayPage(HttpDispatcherLink.java:566)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.send404Message(HttpDispatcherLink.java:577)
4XESTACKTRACE                at com/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink.ready(HttpDispatcherLink.java:426)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.handleDiscrimination(HttpInboundLink.java:569)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.handleNewRequest(HttpInboundLink.java:503)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.processRequest(HttpInboundLink.java:363)
4XESTACKTRACE                at com/ibm/ws/http/channel/internal/inbound/HttpInboundLink.ready(HttpInboundLink.java:330)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor$Http2Ready.run(H2StreamProcessor.java:1791)
4XESTACKTRACE                at com/ibm/ws/threading/internal/ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:247)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
4XESTACKTRACE                at java/lang/Thread.run(Thread.java:825)

Additional context
javacore.20230201.101635.15155.0001.txt

Add any other context about the problem here.

gjdeval commented 1 year ago

While attempting to use AdminCenter this morning, I inadvertently recreated this problem, except this time with 6 DE threads ... see attached javacore. So the problem is not limited to the minimal config with 4 DE threads.

In this case, one of the hung threads is in a different callstack:

3XMTHREADBLOCK     Blocked on: com/ibm/ws/http/channel/h2internal/H2StreamProcessor@0x00000000E668C238 Owned by: "Default Executor-thread-4" (J9VMThread:0x00000000006F3B00, java/lang/Thread:0x00000000C0151698)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.flushDataWaitingForWindowUpdate(H2StreamProcessor.java:877)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.connectionWindowSizeUpdated(H2StreamProcessor.java:883)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2InboundLink.incrementConnectionWindowUpdateLimit(H2InboundLink.java:872)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processWindowUpdateFrame(H2StreamProcessor.java:801)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2StreamProcessor.processNextFrame(H2StreamProcessor.java:480)
5XESTACKTRACE                   (entered lock: com/ibm/ws/http/channel/h2internal/H2StreamProcessor@0x00000000E15E46B0, entry count: 1)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/FrameReadProcessor.processCompleteFrame(FrameReadProcessor.java:125)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2InboundLink.processRead(H2InboundLink.java:678)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2InboundLink.processRead(H2InboundLink.java:607)
4XESTACKTRACE                at com/ibm/ws/http/channel/h2internal/H2MuxTCPReadCallback.complete(H2MuxTCPReadCallback.java:42)
4XESTACKTRACE                at com/ibm/ws/channel/ssl/internal/SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1826)
4XESTACKTRACE                at com/ibm/ws/tcpchannel/internal/WorkQueueManager.requestComplete(WorkQueueManager.java:516)
4XESTACKTRACE                at com/ibm/ws/tcpchannel/internal/WorkQueueManager.attemptIO(WorkQueueManager.java:586)
4XESTACKTRACE                at com/ibm/ws/tcpchannel/internal/WorkQueueManager.workerRun(WorkQueueManager.java:970)
4XESTACKTRACE                at com/ibm/ws/tcpchannel/internal/WorkQueueManager$Worker.run(WorkQueueManager.java:1059)
4XESTACKTRACE                at com/ibm/ws/threading/internal/ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:247)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
4XESTACKTRACE                at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
4XESTACKTRACE                at java/lang/Thread.run(Thread.java:825)

As before, I've taken a core dump, let me know if I should upload that to assist with diagnosis.

issue-24167-hung-6-threads-javacore.20230203.091606.44149.0001.txt

wesley-philip commented 3 months ago

We are noticing the same issue when sending 50+ HTTP/2 requests to our Open Liberty server version 23.0.0.12

Is there any resolution for this issue?