brettwooldridge / HikariCP

光 HikariCP・A solid, high-performance, JDBC connection pool at last.
Apache License 2.0
19.96k stars 2.93k forks source link

it is necessary to check the connection status in the post-processing logic of "PoolBase#isConnective Alive". #1212

Open marcusmoon opened 6 years ago

marcusmoon commented 6 years ago

Environment

HikariCP version: 3.2.0
JDK version     : 1.8.0_141
Database        : MySQL
Driver version  : mysql connector 5.1.46

HikariCP is calling "Connection # isValid (timeout)" to check the connection. For mysql, if the response from the request sent for the connection validation does not come within the specified time, the connection is terminated and the "false" is returned.

https://gisub.com/mysql/mysql-connector-j/blob/release/5.1/src/com/mysql/jdbc/JDBC4Connection.java #L102

So, it seems that the actual cause is hidden by the below code .

https://gisub.com/brettwooldridge/HikariCP/blob/HikariCP-3.2.0/src/main/java/com/zaxxer/hikari/poolBase.ja #L165

Because it is already terminated, it appears that the following secondary error occurs by the above code.

xxx - Failed to validation connection com.mysql.jdbc.JDBC4Connection @yyy (No opens followed after connection closed). Possibly concider using a storter maxLifetime value.

If possible, I think we should first check whether the connection is closed.

 }
 finally {
    if(!connection.isClosed()) {
        setNetworkTimeout(connection, networkTimeout);

        if (isIsolateInternalQueries && !isAutoCommit) {
            connection.rollback();
        }
    } else {
        LOGGER.warn("{} - Failed to validate connection {}. Possibly consider using a shorter maxLifetime value.", poolName, connection);
    }
 }
mdeinum commented 6 years ago

We ran into something similar with Postgres and a mistakenly configured connectionTestQuery. The query is always executed even if the Connection is already closed. Guarding this method with a connection.isClosed would bypass this and prevent our logs from filling up.

Preferably log an INFO message instead of WARN.

brettwooldridge commented 6 years ago

@mdeinum I have intentionally steered clear of using the isClosed() call in the validation path. The reason is a little difficult to explain, but I'll give it a shot...

The JDBC specification says nothing about how the isClosed() call should be implemented by a driver. Some driver implementations might choose to fire TCP traffic over the wire to determine whether the Connection is open or closed -- because the Connection/socket can be closed unilaterally by the server. Hopefully, driver developers choose more efficient state tracking than that, but again, it is only a hope.

If we implemented the equivalent of (pseudo-code):

boolean validate() {
  if (connection.isClosed()) return false;

  if (!connection.isValid()) return false;

  return true;
}

We run the risk that every validation check may incur two roundtrips to the server, rather than one. One for isClosed() (if poorly implemented), and one for isValid().

If the issue is that the ultimate cause of the failure (i.e. the originating SQLException) is masked by an exception thrown in the subsequent finally, that is a separate issue and can be dealt with.

mdeinum commented 6 years ago

Well after further inspection the actual exception occurs in the setNetworkTimeout, apparently Postgres needs an open connection for that. The driver checks if the connection isn't closed before setting the timeout, if closed it throws an exception.

We solved it by setting the minIdle, apparently that was forgotten and this disables the check on idle connections (if the minIdle and maxConnections are the same no active idle checking is done, the idle checking thread isn 't started/registered).

knechtandreas commented 6 years ago

I'm pretty sure I'm seeing the same problem rarely in our stack. I checked for leaked connections nothing.

We run on AWS Lambda connecting to AWS RDS Postgres. Details:

Here's a stacktrace of this happening:

java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:676)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:190)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
at com.codebarrel.lambda.rds.DbConnectionManagerImpl.execute(DbConnectionManagerImpl.java:47)
at com.codebarrel.automation.store.RdsAutomationQueueStore.resolveItem(RdsAutomationQueueStore.java:216)
at com.codebarrel.automation.api.queue.AutomationQueueImpl.resolveItem(AutomationQueueImpl.java:78)
at com.codebarrel.automation.api.queue.AutomationQueueImpl$$EnhancerByGuice$$62d23cac.CGLIB$resolveItem$3(<generated>)
at com.codebarrel.automation.api.queue.AutomationQueueImpl$$EnhancerByGuice$$62d23cac$$FastClassByGuice$$40f20a5e.invoke(<generated>)
at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:75)
at com.codebarrel.lambda.util.TimingInterceptor.invoke(TimingInterceptor.java:38)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:75)
at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:55)
at com.codebarrel.automation.api.queue.AutomationQueueImpl$$EnhancerByGuice$$62d23cac.resolveItem(<generated>)
at com.codebarrel.aws.lambda.automation.QueueConsumer.lambda$handleRequest$2(QueueConsumer.java:80)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at com.codebarrel.aws.lambda.automation.QueueConsumer.handleRequest(QueueConsumer.java:80)
at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at lambdainternal.EventHandlerLoader$PojoMethodRequestHandler.handleRequest(EventHandlerLoader.java:259)
at lambdainternal.EventHandlerLoader$PojoHandlerAsStreamHandler.handleRequest(EventHandlerLoader.java:178)
at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:888)
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:292)
at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:94)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:783)
at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1556)
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:548)
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:164)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:179)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
at com.codebarrel.lambda.rds.DbConnectionManagerImpl.execute(DbConnectionManagerImpl.java:47)
at com.codebarrel.automation.store.RdsAutomationQueueStore.getQueuedCountForAuditItem(RdsAutomationQueueStore.java:267)
at com.codebarrel.automation.api.queue.AutomationQueueImpl.hasMoreItemsQueued(AutomationQueueImpl.java:110)
at com.codebarrel.lambda.util.TimingInterceptor.invoke(TimingInterceptor.java:38)
at com.codebarrel.automation.api.service.SingleRuleExecutorServiceImpl.execute(SingleRuleExecutorServiceImpl.java:200)
at com.codebarrel.automation.service.AutomationExecutionServiceImpl.lambda$processEvent$4(AutomationExecutionServiceImpl.java:78)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at com.codebarrel.automation.service.AutomationExecutionServiceImpl.processEvent(AutomationExecutionServiceImpl.java:78)
at com.codebarrel.lambda.util.TimingInterceptor.invoke(TimingInterceptor.java:38)
at com.codebarrel.aws.lambda.automation.QueueConsumer.executeItem(QueueConsumer.java:102)
at com.codebarrel.aws.lambda.automation.QueueConsumer.lambda$handleRequest$1(QueueConsumer.java:73)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at com.codebarrel.aws.lambda.automation.QueueConsumer.handleRequest(QueueConsumer.java:70)
at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

From looking at the source it looks like the Postgres driver throws an exception when trying to set the network timeout due to the connection being closed already, when called from the finally block in isConnectionAlive.

tomdcc commented 5 years ago

We're seeing something similar:

Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 3027ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:676)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:190)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
    at ratpack.jdbctx.internal.DelegatingDataSource.getConnection(DelegatingDataSource.java:35)
    at ratpack.jdbctx.internal.TransactionalDataSource.getConnection(TransactionalDataSource.java:35)
    at org.jooq.impl.DataSourceConnectionProvider.acquire(DataSourceConnectionProvider.java:83)
    ... 24 more
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:783)
    at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1556)
    at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:550)
    at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:165)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:179)
    ... 29 more

Calling setNetworkTimeout on a closed Postgres connection throws, and Hikari doesn't deal with it and throws rather than just returning false from PoolBase.isConnectionAlive.

tomdcc commented 5 years ago

Also note that the setNetworkTimeout in the finally block in isConnectionAlive is failing when trying to deal with the earlier failure of the same method. It would probably be enough to catch the failure in the first setNetworkTimeout and just return false at that point.

ewirch commented 5 years ago

I was about to create a similar issue. As a general rule:

Failing to do so, will hide the original exception and make root cause analysis impossible.

yskumar34 commented 4 years ago

Same issue with DB2 also.

szegad commented 4 years ago

Same issue with MS SQL also. It's seems that major DBs check for closed connection before setting network timeout.

vamsi-616 commented 3 years ago

Hi All,

Did any one find a solution for this issue ?

serioga commented 3 years ago

@vamsi-616 I set maxLifetime and idleTimeout to avoid such issue. I.e. maxLifetime=300000, idleTimeout=60000. Not sure that second options is relevant though, I've experimenting long time ago...

vamsi-616 commented 3 years ago

We are seeing this issue intermittently. I tried to reduce max life time and it did not helped. Any other inputs ?

Failed to validate connection ConnectionID:57721 ClientConnectionId: 3ece1d62-4cd8-4ca4-aad8-646bcdbe2dad (The connection is closed.). Possibly consider using a shorter maxLifetime value.

Hikaricp is 3.4.1 [compile] - Comes with spring-boot-starter-jdbc: 2.2.1.RELEASE Microsoft SQL Server 2014 Springboot 2.2.1

minimum-idle: 5 maximum-pool-size: 25 connection-timeout: 30000 idle-timeout: 25000 max-lifetime: 300000 allow-pool-suspension: true leak-detection-threshold: 25000 validationtimeout : 2000

brettwooldridge commented 3 years ago

Generally speaking this exception should never make it through to your application. If you call getConnection(), the pool will select a Connection from the pool, and attempt to validate it. Validation may throw an exception if the Connection has already been closed by the database. In this case, HikariCP will then obtain another Connection from the pool, and attempt to validate again.

The only way the validation gets through is when HikariCP is unable to obtain a valid connection within the "connectionTimeout" period specified. This can occur for one of two reasons:

vamsi-616 commented 3 years ago

The pool is not exhausted as we printed Hikaripool connections for each service call and at any point of time it is not more than 3 connections. We have min pool and max pool as 25.

We are trying to figure out why Hikari is unable to validate connection with in first occurrence intermittently ? - As per my analysis it's not able to get any response with in valdiationTimeout (Default is 5 second) , so it's marking as closed and trying to pull next available connection, this process continues until connectionTimeout.

vamsi-616 commented 3 years ago

We tried all combinations, however still we are seeing intermittently hikari connection issues. Did any one find the actual cause ?

We tried with Max-life-time to 100000 but still the error indicates "Failed to validate connection (This connection has been closed.). Possibly consider using a shorter maxLifetime value"

brettwooldridge commented 3 years ago

@vamsi-616 If the pool is configured for 25 connections, but you are not able to get above 3, that sounds like a limitation on the database side, and nothing to do with HikariCP. Is this a Google or AWS database instance?

lapidus79 commented 3 years ago

I encounter this issue while being connected to an AWS Aurora postgres, from an AWS EKS over a peering connection between two VPC networks.

lapidus79 commented 3 years ago

Not sure if the network infra has any relevance here though