Open ocseal opened 2 years ago
The following thread in the javadump shows which pausable component is preventing all other pausable components from being called:
"Default Executor-thread-33" #144 daemon prio=5 os_prio=31 cpu=11.64ms elapsed=58.19s tid=0x00007fcfb0bb5800 nid=0x17813 in Object.wait() [0x00007000043fd000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.15/Native Method)
- waiting on <no object reference available>
at com.ibm.ws.http.internal.HttpChain$StopWait.waitForStop(HttpChain.java:953)
- waiting to re-lock in wait() <0x000000060175cc58> (a com.ibm.ws.http.internal.HttpChain$StopWait)
at com.ibm.ws.http.internal.HttpChain.stop(HttpChain.java:223)
- locked <0x000000060175cc18> (a com.ibm.ws.http.internal.HttpChain)
at com.ibm.ws.http.internal.HttpEndpointImpl$3.run(HttpEndpointImpl.java:187)
- locked <0x000000060175cf18> (a com.ibm.ws.http.internal.HttpEndpointImpl$1)
at com.ibm.ws.http.internal.HttpEndpointImpl.processHttpChainWork(HttpEndpointImpl.java:391)
at com.ibm.ws.http.internal.HttpEndpointImpl.pause(HttpEndpointImpl.java:1070)
at com.ibm.ws.runtime.update.internal.PauseableComponentQuiesceListener.serverStopping(PauseableComponentQuiesceListener.java:52)
at com.ibm.ws.runtime.update.internal.RuntimeUpdateManagerImpl$3.run(RuntimeUpdateManagerImpl.java:380)
at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:245)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.15/Executors.java:515)
at java.util.concurrent.FutureTask.run(java.base@11.0.15/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.15/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.15/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.15/Thread.java:829)
Locked ownable synchronizers:
- <0x0000000611dcd428> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Either PauseableComponentQuiesceListener.serverStopping
needs to submit all of the calls to pause
on asynchronous threads, or com.ibm.ws.http.internal.HttpEndpointImpl.pause
needs to do that.
This thread shows that com.ibm.ws.http.internal.HttpEndpointImpl.pause
is calling com.ibm.ws.http.internal.HttpChain$StopWait.waitForStop(HttpChain.java:953)
and waiting for 30 seconds before aborting.... which is also the server quiesce timeout.... thus server quiesce is already over by the time the wait aborts.
Seems that if HttpEndpointImpl.pause()
is going to wait (and thus coudl be blocked), then it should spawn a thread to do that to allow other pausable components a chance to be called.
@brenthdaniel Can you take a look at this and determine if it should be PausableComponent and determine the best action. Basically the call to HttpEndpointImpl.pause()
hangs for 30 seconds, thus preventing other pause calls from occurring until after quiesce has completed, resulting in an error.... and of course the delay of work that could be done otherwise.
Seems either Pausable component shoudl spawn the work asynchronously or the http code should.
Hi,
Pauseable components are expected to return in a reasonable amount of time. The idea is to cut off new requests coming into the system so that we can wait for existing work to complete. Someone running the 'server pause
It's not clear to me why in this scenario the chain would have problems shutting down, so maybe that's the first thing to fix here. Is the issue that some other PauseableComponent needs to run to unblock the chain? Or maybe some other component pausing is what is causing the chain to be blocked? In the latter case, running asynchronously would just move the race condition around.
We could perhaps have a design issue to see if we want to change the behavior of the quiesce to run PauseableComponents asynchronously to avoid any poorly behaving components, but I think we should understand what's going on here. In most cases, failing to close off an HTTP port is a legitimate quiesce failure.
Thanks @brenthdaniel . Since the current design is that pauseable components should return in a reasonable amount of time, then I'll assign this issue to @mrsaldana to address the fact that HttpEndpointImpl.pause()
is blocking and waiting for 30 seconds, which is the entire duration of server quiesce. Seems that HttpEndpointImpl.pause()
should either be running HttpChain.stop() asynchronously or with a much smaller timeout, at least during server quiesce. Otherwise, the 30 second timeout defeats the purpose of the PauseableComponentQuiesceListener
.
Not sure why HttpChain.stop
is blocking for 30 seconds, but the side effect is that all other pauseable components that happen to be in the list after this http endpoint are not given the chance to pause until after server quiesce has completed.
Needs prioritization, suggested fix, and suggested assignment. There has been no meaningful activity since 25-Aug-2022.
Per Tracy:
When a request is made to shutdown the server, then during server quiesce, the kernel will synchronously notify the pausable components to pause. The dump showed that com.ibm.ws.http.internal.HttpEndpointImpl.pause was taking 30 seconds, thus not completing prior to server quiesce completing, preventing other pausable components from being notified.
This might be resolved by either:
Brent Danial suggested that the intent of he kernel was that calling pause was not intended to take long, so the kernel should not need to update, but instead any pausable component that was taking too long should be fixed.
Will review with the Transport squad for prioritization.
Describe the bug
When running an open-source app with Open Liberty 22.0.0.7 without cleaning the target, error code CWWKE1102W (The quiesce operation did not complete) occurs when the server is stopped.
Steps to Reproduce
Expected behavior
The server should shut down without any quiesce operation failures.
Diagnostic information:
Additional context
The relevant portion of the trace.log can be found by searching for "com.ibm.ws.runtime.update.internal.PauseableComponentQuiesceListener.serverStopping(PauseableComponentQuiesceListener.java:52)".