aerospike / aerospike-client-java

Aerospike Java Client Library
Other
236 stars 212 forks source link

On startup aerospike client get operations take much more time than usually #135

Closed Aloren closed 1 year ago

Aloren commented 5 years ago

We observe increased timing on aerospike-client read operations after service is startup. Usually timing is around 2ms, but after startup it increases up to 100 ms. As I understand that might be an issue with connection pool, since each request needs new connection to Aerospike. Until pool reaches average pool size -- timing is higher than expected. We are thinking about warming up the pool -- maybe it makes sense to have such option in aerospike-client (if that is applicable)? What are your thoughts?

Thanks.

Screen Shot 2019-05-07 at 3 43 17 PM

Aloren commented 5 years ago

Looks like it is related to #123

BrianNichols commented 5 years ago

One way to warm up the pool is to issue sync or async read commands. In addition to creating connections, the applicable java code is also loaded and initialized. This is helpful in large libraries like netty that create a large number of classes. I think issuing these read commands is better done outside the client, because the user will have a better understanding on what to read and whether to use async and/or sync reads.

We are considering adding new ClientPolicy minConnsPerNodeSync and minConnsPerNodeAsync arguments that would pre-allocate connections and never drop connections below the minimum, but this does not address the warmup of the java code path.

BrianNichols commented 5 years ago

We have decided on an alternate approach already used in our go client. A warmup method will be added to AerospikeClient. warmup(ConnectionType type, int count) will initialize "count" connections on each node and put those connections into each node's connection pool. ConnectionType indicates sync or async. These connections will still be subject for removal if they are idle for more than "ClientPolicy.maxSocketIdle". warmup can be called anytime after AerospikeClient instantiation.

We are currently busy with other projects and will implement when time becomes available.

mrozk commented 3 years ago

Hi! I suppose we have problem with minConnsPerNode in LIFO connections pool. Assume we have added minConnsPerNode=25 and we have 25 connections in LIFO pool with TTL 55 seconds by default. We have not very big load on start, and we are using top 5 connections from the pool and we are not using 20 connections for a long time. Hence this 20 connections is becoming invalid. When we will have a traffic spike, instead of warmed up connections in pool we will have a pool of invalid connections and aerospike client we be executing the next code

if (conn != null) {
                // Found socket.
                // Verify that socket is active.
                if (cluster.isConnCurrentTran(conn.getLastUsed())) {
                    try {
                        conn.setTimeout(timeoutMillis);
                        return conn;
                    }
                    catch (Exception e) {
                        // Set timeout failed. Something is probably wrong with timeout
                        // value itself, so don't empty queue retrying.  Just get out.
                        closeConnection(conn);
                        throw new AerospikeException.Connection(e);
                    }
                }
                closeConnection(conn);
            }

Client closes invalid connections and after that client need to create new connections to execute our requests. I think we need to have kind of async healthchecks for connection pool with min connections configuration.

BrianNichols commented 3 years ago

If client minConnsPerNode > 0, it's highly recommended that client maxSocketIdle and server proto-fd-idle-ms be set to zero. This will prevent valid connections from being discarded due to expiration. The javadocs explicitly mention this:

https://www.aerospike.com/apidocs/java/com/aerospike/client/policy/ClientPolicy.html#minConnsPerNode

The server employs TCP keep-alive, so it can still detect and reap peer closed sockets without an expiration.

yarosman commented 2 years ago

Hello. @BrianNichols therefore the current solution is

    clientPolicy.asyncMinConnsPerNode = clientPolicy.maxConnsPerNode
    clientPolicy.maxSocketIdle        = 0

am I right ?

BrianNichols commented 2 years ago

Yes. Also, make sure proto-fd-idle-ms is 0 on the server nodes.

mrozk commented 2 years ago

@BrianNichols Hello. it looks like minConnsPerNode does not work properly because sometimes it is not possible to set proto-fd-idle-ms is 0 on the cluster side and we can't handle spikes in this case. I have an idea how we can fix this. Now we have LIFO connection pool implementation. If I we configure minConnsPerNode 25, and normally use only 5, I will have 20 non working connections in pool because those will be expired. When I will have spike, application will start to recreate connections and response time of an application will degrade. If we would have LILO connection pool, we might could handle spikes, but we will have another broblem, how to close unused connection after requests spike is over. For example we set minConnsPerNode=25 in LILO connection pool implementation, we use 25 everything works fine. After spike our connection pool will grow to 50 connections that will never shrink to 25. What if we will combine 2 approaches. For example we could have strategy in case when minConnsPerNode is configured to have 2 data structures for connection pool. We store first 25 connections in LILO structure and other connections that are higher than 25, we will be storing in LIFO data structure which can be shrinked. In this case aerospike java client will be able to handle spikes without touching cluster configs.

BrianNichols commented 2 years ago

Why is setting proto-fd-idle-ms to 0 not possible?

The proto-fd-idle-ms default has been 0 since at least server version 4.9.