eclipse-ee4j / grizzly

Grizzly
https://eclipse-ee4j.github.io/grizzly
Other
147 stars 68 forks source link

SingleEndpointPool.cleanupIdleConnections() infinity loop #2163

Closed DBlckwd closed 2 years ago

DBlckwd commented 2 years ago

Hello, I am facing with this problem.

One of the threads fallen to infinity loop in SingleEndpointPool.detach() and block other threads

It looks like there is a connection leak somewhere, resulting in an infinite loop. The exact location of the connection leak cannot be determined.

Inside detach() method we are trying to remove connection from connectionsMap. But if it's not there, nothing is happening. At the same time readyConnections.getFirstLink(); doesn't actually remove element from the queue. it happens later inside downstream methods. So, if we were not able to locate connection in the connectionsMap we are not going to remove connection from the readyConnections queue.

According to our heapdump, we just got this situation. The link was available in readyConnections queue (it was FirstLink), but the same link was not in connectionsMap in detach() method we try to remove connection from connectionMap, this connection was not in this map and we're caught in an infinity loop

Grizzly connection pool version 2.3.28 Jersey version 2.25.1

ThreadDump small info: 1 thread in RUNNING state, methode: SingleEndpointPool.detach() - (locked <0x00000005be1ce700>) 1 thread in BLOCKED state, methode: SingleEndpointPool$PoolConnectionCloseListener.onClosed() - (waiting to lock <0x00000005be1ce700>) 201 threads in BLOCKED state, methode: SingleEndpointPool.take() - (waiting to lock <0x00000005be1ce700>)

"connection-pool-delays-thread-pool(1)" daemon prio=10 tid=0x00007ef5e4001800 nid=0x4092 runnable [0x00007ef51a4e4000] java.lang.Thread.State: RUNNABLE at org.glassfish.grizzly.connectionpool.SingleEndpointPool.detach(SingleEndpointPool.java:943)

"pool-43-thread-136" prio=10 tid=0x00007efc58006800 nid=0x670b waiting for monitor entry [0x00007ef501d5d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.glassfish.grizzly.connectionpool.SingleEndpointPool$PoolConnectionCloseListener.onClosed(SingleEndpointPool.java:1332)

"jersey-client-async-executor-0" prio=10 tid=0x00007ef5e4be3000 nid=0x4106 waiting for monitor entry [0x00007ef513373000] java.lang.Thread.State: BLOCKED (on object monitor) at org.glassfish.grizzly.connectionpool.SingleEndpointPool.take(SingleEndpointPool.java:759)

Thanks in advance.

carryel commented 2 years ago

Firstly, the grizzly version seems too low. In the current eclipse project, it seems that it can be modified from at least v2.4.4.

However, when looking at the code revision history, the logic seems somewhat similar even if a higher version is used. If we can't reproduce it with a testcase, it seems pretty difficult to find the problem.

As you probably know, the workaround to avoid this problem is to disable keep-alive mechanism of SingleEndpointPool by setting keepAliveTimeoutMillis equal to or less than 0, or to set corePoolSize and maxPoolSize to the same value.

Is it possible to adjust the keep-alive or pool size of SingleEndpointPool with Jersey's settings?

DBlckwd commented 2 years ago

Hi @carryel Thanks for the answer Checking the workarounds you suggested

Do you have any updates regarding the root cause of this bug?

carryel commented 2 years ago

@DBlckwd When I looked up to the latest version, there were no related updates. (In addition, the issue was not seen well from my point of view when looking only at the code.)

DBlckwd commented 2 years ago

Hi @carryel To solve this problem, we used the workaround you suggested.. This helped and now we do not see this problem. But the main cause of the error could not be determined.

Thank you!