ari-ban / issue-test

0 stars 0 forks source link

Deadlock between NIOConnection#notifyCloseListeners and SingleEndpointPool#detach #1988

Closed arinban closed 6 years ago

arinban commented 6 years ago

When I used grizzly-framework v1.3.33~ and v2.4.2~, I found the following deadlock.

Found one Java-level deadlock:
=============================
"capri-api-web-server-271":
  waiting to lock monitor 0x00007f3d240c3158 (object 0x0000000442a79d78, a java.lang.Object),
  which is held by "connection-pool-delays-thread-pool(1)"
"connection-pool-delays-thread-pool(1)":
  waiting to lock monitor 0x00007f3e0006b7d8 (object 0x0000000544007d40, a java.util.Collections$SynchronizedList),
  which is held by "CapriAsyncHttpClient-24"
"CapriAsyncHttpClient-24":
  waiting to lock monitor 0x00007f3d240c3158 (object 0x0000000442a79d78, a java.lang.Object),
  which is held by "connection-pool-delays-thread-pool(1)"

Java stack information for the threads listed above:
===================================================
"capri-api-web-server-271":
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:759)
        - waiting to lock <0x0000000442a79d78> (a java.lang.Object)
        at org.glassfish.grizzly.connectionpool.MultiEndpointPool.take(MultiEndpointPool.java:592)
        at com.ning.http.client.providers.grizzly.ConnectionManager.openAsync(ConnectionManager.java:143)
        at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider.execute(GrizzlyAsyncHttpProvider.java:174)
        at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:505)
        at com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:228)
        at com.kakao.capri.commons.ahc.helper.AsyncHttpRequestHelper._executeWithTimeoutException(AsyncHttpRequestHelper.java:268)
        at com.kakao.capri.commons.ahc.helper.AsyncHttpRequestHelper.executeWithTimeoutException(AsyncHttpRequestHelper.java:257)
        at com.kakao.capri.commons.papi.story.StoryPapiV1ServiceParent.getResponse(StoryPapiV1ServiceParent.java:101)
        at com.kakao.capri.commons.papi.story.StoryPapiV1ServiceParent.request(StoryPapiV1ServiceParent.java:68)
        at com.kakao.capri.commons.papi.story.StoryPapiV1ServiceParent.request(StoryPapiV1ServiceParent.java:87)
        at com.kakao.capri.commons.papi.story.StoryPapiV1ServiceImpl.isStoryUser(StoryPapiV1ServiceImpl.java:161)
        at com.kakao.capri.api.service.impl.StoryAPIV1ServiceImpl.isStoryUser(StoryAPIV1ServiceImpl.java:94)
        at com.kakao.capri.api.server.handler.story.v1.IsStoryUserHandler.invoke(IsStoryUserHandler.java:30)
        at java.lang.invoke.LambdaForm$DMH/1705929636.invokeSpecial_L3_L(LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$BMH/519072832.reinvoke(LambdaForm$BMH)
        at java.lang.invoke.LambdaForm$MH/709545106.invoker(LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/1846386767.invokeExact_MT(LambdaForm$MH)
        at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
        at com.kakao.capri.api.server.handler.APIHandler.invoker(APIHandler.java:715)
        at com.kakao.capri.api.server.handler.APIHandler.doService(APIHandler.java:497)
        at com.kakao.server.web.handler.GrizzlyHandler.service(GrizzlyHandler.java:129)
        at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:206)
        at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:180)
        at org.glassfish.grizzly.http.server.HttpHandlerChain.doHandle(HttpHandlerChain.java:224)
        at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:242)
        at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
        at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
        at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:539)
        at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
        at com.kakao.common.util.StatisticKakaoThreadPoolExecutor$StatisticRunnable.run(StatisticKakaoThreadPoolExecutor.java:279)
        at com.kakao.common.util.KakaoThreadPoolExecutor$Worker.runTask(KakaoThreadPoolExecutor.java:950)
        at com.kakao.common.util.KakaoThreadPoolExecutor$Worker.run(KakaoThreadPoolExecutor.java:987)
        at java.lang.Thread.run(Thread.java:748)
"connection-pool-delays-thread-pool(1)":
        at java.util.Collections$SynchronizedCollection.remove(Collections.java:2038)
        - waiting to lock <0x0000000544007d40> (a java.util.Collections$SynchronizedList)
        at org.glassfish.grizzly.nio.NIOConnection.removeCloseListener(NIOConnection.java:695)
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool.detach(SingleEndpointPool.java:946)
        - locked <0x0000000442a79d78> (a java.lang.Object)
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool.cleanupIdleConnections(SingleEndpointPool.java:1107)
        - locked <0x0000000442a79d78> (a java.lang.Object)
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool$KeepAliveCleaner.doWork(SingleEndpointPool.java:1404)
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool$KeepAliveCleaner.doWork(SingleEndpointPool.java:1399)
        at org.glassfish.grizzly.utils.DelayedExecutor$DelayedRunnable.run(DelayedExecutor.java:159)
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:593)
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:573)
        at java.lang.Thread.run(Thread.java:748)
"CapriAsyncHttpClient-24":
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool$PoolConnectionCloseListener.onClosed(SingleEndpointPool.java:1339)
        - waiting to lock <0x0000000442a79d78> (a java.lang.Object)
        at org.glassfish.grizzly.connectionpool.SingleEndpointPool$PoolConnectionCloseListener.onClosed(SingleEndpointPool.java:1332)
        at org.glassfish.grizzly.nio.NIOConnection.invokeCloseListener(NIOConnection.java:1023)
        at org.glassfish.grizzly.nio.NIOConnection.notifyCloseListeners(NIOConnection.java:890)
        - locked <0x0000000544007d40> (a java.util.Collections$SynchronizedList)
        at org.glassfish.grizzly.nio.NIOConnection.terminate0(NIOConnection.java:622)
        at org.glassfish.grizzly.nio.transport.TCPNIOConnection.terminate0(TCPNIOConnection.java:291)
        at org.glassfish.grizzly.nio.transport.TCPNIOTransport.read(TCPNIOTransport.java:612)
        at org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleRead(TCPNIOTransportFilter.java:75)
        at org.glassfish.grizzly.filterchain.TransportFilter.handleRead(TransportFilter.java:173)
        at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
        at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
        at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
        at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:539)
        at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
        at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
        at com.kakao.common.util.StatisticKakaoThreadPoolExecutor$StatisticRunnable.run(StatisticKakaoThreadPoolExecutor.java:279)
        at com.kakao.common.util.KakaoThreadPoolExecutor$Worker.runTask(KakaoThreadPoolExecutor.java:950)
        at com.kakao.common.util.KakaoThreadPoolExecutor$Worker.run(KakaoThreadPoolExecutor.java:987)
        at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

0x00007f3d240c3158   0x0000000442a79d78   pool lock

0x00007f3e0006b7d8    0x0000000544007d40 a java.util.Collections$SynchronizedList

At http://bit.ly/2AgZxEG , I reported a strange phenomenon that Grizzly HTTP server didn't accept requests from clients. I doubted 'force-selector-spin-detection' option, but I found the problem was a deadlock.

The deadlock seems to be the side-effect of issue #1968 (https://github.com/javaee/grizzly/issues/1968) and https://github.com/afelisatti/grizzly/commit/b9eaa9afd3043381726211715cb7f73e354646d5 . This change was applied at v2.3.33.

I think that the simple fix is that we should reintroduce the concurrent collection for NIOConnection's closeListeners like CopyOnWriteArrayList. (Or I think we should change the poolSync lock's scope in SingleEndpointPool. ex) calling Connection#removeCloseListener and Connection#addCloseListener without poolSync lock.)

arinban commented 6 years ago
arinban commented 6 years ago

@carryel Commented Similar to https://github.com/javaee/grizzly/issues/1983 .

arinban commented 6 years ago

@carryel Commented Fixed at next version.