brettwooldridge / HikariCP

光 HikariCP・A solid, high-performance, JDBC connection pool at last.
Apache License 2.0
20.04k stars 2.94k forks source link

maxLifetime is not respected for connections that are in use during HouseKeeper run #2014

Open RB14 opened 1 year ago

RB14 commented 1 year ago

Hello,

We have a use-case for using the maxLifetime configuration in order to refresh connections every 1 minute. However, it turns out (purely from experimenting, see below) that maxLifetime is not respected if it happens that the connection is in-use whenever the House Keeper runs (I believe that's every 30 seconds). So it means, if we have a connection that is being used extensively without rest, handling query after query after query, non stop, it has the potential to never be evicted and refreshed due to maxLifetime. I am not talking about long queries, I am talking mostly about connections that handle lots of queries, and it just happens that every time the House Keeper runs, they are busy. This can cause starvation to some, or all connections.

Of course the scenario I described above might indicate another problem in our configuration (e.g. too few connections, etc...) but the problem I'm describing with maxLifetime can happen sporadically in other cases as well. What bothers me is that the behavior defined by maxLifetime is non deterministic, it relies on the coincidence that a certain connection is not being used exactly at the time the House Keeper runs, and there is no attempt to check maxLifetime again, the next time the connection is used for another query.

I'm writing all of the above from experiments, not by actually looking at the code, so forgive me if I'm mistaken in my analysis. The experiment I did was using a single connection CP, with a maxLifetime of 30 seconds. I executed a query that "sleeps" for 12 seconds (sleeps in the database, I was using pg_sleep(12)) in an infinite loop. Connection was never closed due to maxLifetime. However, when I executed the same query, but put a sleep of 30 seconds in the process between iterations (i.e. after every query to the db it waits 30 seconds before next query), I observed that it respected maxLifetime in all iterations where HouseKeeper was running during the sleep in the process, but not while it ran during the sleep in the db. I gathered the information from the logs (enabled TRACE logs for both Hikari and for the postgres driver).

Here's a code snippet that demonstrate my experiment:

    HikariConfig config = new HikariConfig();
    config.setJdbcUrl("<jdbc_url>");
    config.setUsername("<username>");
    config.setPassword("<password>");
    config.setMinimumIdle(1);
    config.setMaximumPoolSize(1);
    config.setMaxLifetime(30000);

    HikariDataSource dataSource = new HikariDataSource(config);
    while (true) {
      try (Connection connection = dataSource.getConnection()) {
        try (Statement statement = connection.createStatement()) {
          statement.execute("SELECT pg_sleep(12)");
        }
      }
      Thread.sleep(30000); // Remove this to observe the bug
    }

It is very important for us that our connections will be refreshed every 1 minute, or, if a connection is busy at that 1 minute mark, the moment the connection is done with current query (or whenever it is used for the next query).

RB14 commented 1 year ago

So a little update... so I was looking at the logs more carefully, and it seems that the connections are actually evicted correctly, roughly at the time of maxLifetime (+ variance). What led me to believe that it behaves incorrectly is that the reason for eviction is not stated to be "due to maxLifetime".

It seems that if connections are evicted due to maxLifetime when they are idle, the log prints:

DEBUG PoolBase:132 - HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection@7d42c224: (connection has passed maxLifetime)

And if they are evicted after usage (meaning they were in use when maLifetime has passed), the log prints:

DEBUG PoolBase:132 - HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection@ce9503f: (connection was evicted)

Not sure how I missed that when I first opened the bug.

So I think there is no actual bug here, other than, maybe, incorrect log message, or insufficient logging information (could be a problem when debugging and trying to understand why connections are evicted).

I am keeping this ticket open just to address the logging "problem". If you think this isn't a problem, feel free to close. Thanks, and sorry for the false report.