eclipse-vertx / vertx-sql-client

High performance reactive SQL Client written in Java
Apache License 2.0
894 stars 200 forks source link

inconsistent behaviour for postgres pool getConnection #1471

Open gillbates opened 1 month ago

gillbates commented 1 month ago

Questions

recently we have meet very bad performance for vertx postgres connection. It sometimes took average 3-5 seconds to acquire the connection in our production environment when all connections are used and switched very frequently.

As its hard to extract the code, we've implement following reproducer which will show inconsistent behaviour.

Version

postgres: postgres:17.0-alpine3.20 vertx: 4.5.10 jdk:

java version "21.0.1" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 21.0.1+12-LTS-29)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.1+12-LTS-29, mixed mode, sharing)

Do you have a reproducer?

In order to simulate concurrently getConnection when full, we have configured max size 1 with 2 concurrent access In the following example ...

package com.xxx.play;

import io.vertx.core.Future;
import io.vertx.core.Promise;
import io.vertx.core.Vertx;
import io.vertx.pgclient.PgBuilder;
import io.vertx.pgclient.PgConnectOptions;
import io.vertx.sqlclient.Pool;
import io.vertx.sqlclient.PoolOptions;
import lombok.extern.log4j.Log4j2;

import java.util.ArrayList;
import java.util.List;

@Log4j2
public class TestPGPool {
    public static void main(String[] args) {

        String host = "localhost";
        int port = 5432;
        String database = "xxx";
        String user = "postgres";
        String password = "123456";
        PgConnectOptions connectOptions = new PgConnectOptions();
        connectOptions.setPort(port);
        connectOptions.setHost(host);
        connectOptions.setUser(user);
        connectOptions.setDatabase(database);
        connectOptions.setPassword(password);
        connectOptions.setCachePreparedStatements(true);
        connectOptions.setPreparedStatementCacheMaxSize(1024);
        connectOptions.setPipeliningLimit(100_000); // Large p

        Vertx vertx = Vertx.vertx();

        PoolOptions poolOptions = new PoolOptions().setMaxSize(1).setShared(true).setEventLoopSize(1).setName("local-pool");
        Pool pool = PgBuilder.pool().with(poolOptions).connectingTo(connectOptions).using(vertx).build();

        List<Future<Void>> futures = new ArrayList<>();
        for (int i = 0; i < 2; i++) {
            int finalI = i;
            log.debug(finalI + " getting connection ");
            Future<Void> future = pool.getConnection().compose(conn -> {
                log.debug(finalI + " got connection");
                return delay(vertx, 1000).compose(v -> {
                    log.debug(finalI + " closing connection");
                    return conn.close();
                }).map(v -> {
                    log.debug(finalI + " Connection closed");
                    return null;
                });
            });
            futures.add(future);
        }

        Future.join(futures).onComplete(res -> {
            if (res.succeeded()) {
                log.debug("Succeeded");
            } else {
                log.debug("error:  " + res.cause().getMessage());
            }
            System.exit(0);
        });
    }

    public static Future<Void> delay(Vertx vertx, long delay) {
        Promise<Void> promise = Promise.promise();
        vertx.setTimer(delay, l -> promise.complete());
        return promise.future();
    }
}

the code above will have following two kinds of output:

  1. Maybe anyone could fix this? This also happened sometimes online, where we got pool closed exception.
[10-10 19:26:29.810][main][DEBUG][c.t.p.TestPGPool][46] - [0 getting connection ]
[10-10 19:26:29.816][main][DEBUG][c.t.p.TestPGPool][46] - [1 getting connection ]
[10-10 19:26:29.918][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][48] - [0 got connection]
[10-10 19:26:30.920][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][50] - [0 closing connection]
[10-10 19:26:30.922][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][53] - [0 Connection closed]
[10-10 19:26:30.922][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][64] - [error:  Pool closed]
[10-10 19:58:37.573][main][DEBUG][c.t.p.TestPGPool][46] - [0 getting connection ]
[10-10 19:58:37.579][main][DEBUG][c.t.p.TestPGPool][46] - [1 getting connection ]
[10-10 19:58:37.684][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][48] - [0 got connection]
[10-10 19:58:38.688][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][50] - [0 closing connection]
[10-10 19:58:38.702][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][53] - [0 Connection closed]
[10-10 19:58:38.703][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][48] - [1 got connection]
[10-10 19:58:39.705][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][50] - [1 closing connection]
[10-10 19:58:39.706][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][53] - [1 Connection closed]
[10-10 19:58:39.707][vert.x-eventloop-thread-0][DEBUG][c.t.p.TestPGPool][62] - [Succeeded]

And the 0 closed the connection in 4ms. So that the other connection could be acquired. I think this might be the case that it took sometimes 3-5s in our production after all connections are occupied. Can we speed up this?

Extra

macos, m3 max

tsegismont commented 1 month ago

I'll take a look

asad-awadia commented 1 month ago

something wrong with the set shared

removing it makes it work as expected

gillbates commented 1 month ago

something wrong with the set shared

removing it makes it work as expected

confirmed. the bug is gone after removing setShared(true) ... wierd ...