Enable TCP keepalives by default in Java REST clients

DaveCTurner commented 3 years ago

Today the REST clients keep HTTP connections open for reuse by future requests. If a connection remains open for a long time then there's a risk that something between the client and Elasticsearch will silently drop the connection, and the client won't know about this until it tries to send another request. Often a request on such a dropped connection is not actively rejected and instead simply times out, possibly after many seconds or even minutes.

Worse, if the connection is dropped silently while a request is in flight then the client may block indefinitely since it will send no more data on this connection and will therefore never find out that it's been closed.

I think we should enable TCP keepalives on these long-lived connections by default so that we can either (a) keep the connection open even when idle, or (b) at least be notified that it has been dropped more promptly.

For the benefit of folk who want to explicitly enable TCP keepalives on the client today, here's how you do it:

    final RestClientBuilder restClientBuilder = RestClient.builder(/* fill in arguments here */);
    // optionally perform some other configuration of restClientBuilder here if needed
    restClientBuilder.setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder
            /* optionally perform some other configuration of httpClientBuilder here if needed */
            .setDefaultIOReactorConfig(IOReactorConfig.custom()
                    /* optionally perform some other configuration of IOReactorConfig here if needed */
                    .setSoKeepAlive(true)
                    .build()));
    final RestHighLevelClient restHighLevelClient = new RestHighLevelClient(restClientBuilder);

The TCP keepalives sent by the client are subject to the network configuration of your OS. In particular on Linux the relevant sysctl settings, and their defaults, are:

net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9

You should normally set net.ipv4.tcp_keepalive_time to 300. The default of 7200 seconds (i.e. 2 hours) is almost certainly too long to wait to send the first keepalive.

elasticmachine commented 3 years ago

Pinging @elastic/es-core-features (Team:Core/Features)

elasticmachine commented 3 years ago

Pinging @elastic/clients-team (Team:Clients)

DaveCTurner commented 3 years ago

Relates https://discuss.elastic.co/t/elasticsearch-resthighlevelclient-is-getting-timeout-exception-if-client-api-is-idle-for-some-time/255442

omarkad2 commented 3 years ago

Hello @DaveCTurner we have encountered the same problem in our environments. We had multiple threads stuck waiting for a response, sometimes for a whole day. Do we know exactly what goes wrong in the HTTP client (httpasyncclient) that can explain this ? We suspect that in the HTTP client (httpasyncclient) we may have a specific flow that ends without marking as completed the Future instance created in org.elasticsearch.client.RestClient#performRequest that can explain why it's WAITING. We would love to have your insight on this.

DaveCTurner commented 3 years ago

I don't think it's finishing without completing the future, at least I'm not aware of such a problem. A forever-WAITING thread is entirely possible if you're not using TCP keepalives even if everything else is working correctly.

aouji commented 3 years ago

Hi @DaveCTurner . I have been struggling with this issue happening after we recover from ioreactor exceptions for more than a week now. We reinstantiate the ES client, but some old server threads get stuck in https://github.com/elastic/elasticsearch/blob/71c0821ffc4c379d65c6b5b2f685cb7895a002f3/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L279 forever. While your solution is great, in the kubernetes world tcp_keepalive_time is a restricted sysctl and considered unsafe to change as it will affect other pods in the same node. Will it be possible to maybe introduce a new timeout variable and apply it to the future's get? I assume something like the following might also help recover from most cases:

httpResponse = client.execute(context.requestProducer, context.asyncResponseConsumer, context.context, null).get(<some value>, TimeUnit.MILLISECONDS)

DaveCTurner commented 3 years ago

some old server threads get stuck

That sounds like a separate issue, but not one related to TCP keepalives. I would guess that closing a client should abort any ongoing requests promptly (I might be wrong there, that's just a guess). Please open a separate issue to investigate that further, and provide a lot more supporting evidence for this possible bug.

in the kubernetes world tcp_keepalive_time is a restricted sysctl and considered unsafe to change as it will affect other pods in the same node.

This reasoning seems flawed for a couple of reasons. In terms of cgroups (on which technologies like Kubernetes are based) the net.* sysctls, including net.ipv4.tcp_keepalive_time, are namespaced and definitely can be made to apply to individual pods, but also if your network environment is tearing down idle connections after some time then you surely want to apply similar keepalive settings to every pod anyway?

I assume something like the following might also help recover from most cases:

No, the whole point of TCP keepalives is to preserve liveness whilst avoiding aborting requests after some (arbitrary) time limit. If you really want this kind of behaviour, you can implement your own abort logic using the Cancellable that is returned from RestClient#performRequestAsync.

oridool commented 3 years ago

I've experienced similar issues, as described in #59261. You can see there the code of how I initialize the client (springboot). The workaround for me was to increase the socket timeout to 45 seconds and connect timeout to 3 seconds. After doing this, all my issues disappeared. Another person on this thread suggested also to increase the keep-alive to 1 hour.

aouji commented 3 years ago

That sounds like a separate issue, but not one related to TCP keepalives. I would guess that closing a client should abort any ongoing requests promptly (I might be wrong there, that's just a guess). Please open a separate issue to investigate that further, and provide a lot more supporting evidence for this possible bug.

Sure. I will try to write up an issue for it.

This reasoning seems flawed for a couple of reasons. In terms of cgroups (on which technologies like Kubernetes are based) the net.* sysctls, including net.ipv4.tcp_keepalive_time, are namespaced and definitely can be made to apply to individual pods, but also if your network environment is tearing down idle connections after some time then you surely want to apply similar keepalive settings to every pod anyway?

Sorry I think I should have put quotes around "unsafe". I didn't mean that as an opinion, but more like "kubernetes considers it unsafe and not allowed by default". Please see https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#enabling-unsafe-sysctls. As a service owner in a large kubernetes cluster in my organization it will take some effort to gain access.

No, the whole point of TCP keepalives is to preserve liveness whilst avoiding aborting requests after some (arbitrary) time limit. If you really want this kind of behaviour, you can implement your own abort logic using the Cancellable that is returned from RestClient#performRequestAsync.

Fair point. Thanks!

Anyway. Sorry for potentially hijacking this issue. I will write up a separate one.

David-Lor commented 3 years ago

@DaveCTurner Do you know if it is possible to use the code snippet you proposed, together with setting custom Connect and Socket timeouts?

When doing something like this:

        clientBuilder
                .setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder
                        .setDefaultIOReactorConfig(IOReactorConfig.custom()
                                .setSoKeepAlive(true)
                                .build()
                        )
                )
                .setRequestConfigCallback(requestConfigBuilder -> requestConfigBuilder
                .setConnectTimeout(30000)
                .setSocketTimeout(30000)
        );

The first part that enables the keepalives seem to be ignored (thus the issues caused by not having it come back). Setting the connect/socket timeouts on the IOReactorConfig.custom() part does not seem to affect the final Elastic queries.

DaveCTurner commented 3 years ago

The code you suggest looks ok to me. I tried it and did not observe it ignoring the keepalive config as you claim:

$ netstat -anto | grep 39013.*ESTABLISHED
tcp6       0      0 127.0.0.1:44598         127.0.0.1:39013         ESTABLISHED keepalive (7196.75/0/0)
tcp6       0      0 127.0.0.1:39013         127.0.0.1:44598         ESTABLISHED keepalive (7196.75/0/0)

(39013 is the port on which I happened to be running Elasticsearch)

David-Lor commented 3 years ago

@DaveCTurner Thank you for looking into it. You're right, tried the netstat command and it shows the connection with keepalive enabled.

What I experienced is that your original snippet fixed my original issue (which was something like this https://github.com/elastic/elasticsearch/issues/59261), but after adding the socket and connect timeouts, the problem came back.

DaveCTurner commented 3 years ago

Ok, there may be something else going on, but this isn't the place to discuss it further. Would you open a thread on https://discuss.elastic.co/ describing the problem you're seeing?

noodlze commented 3 years ago

Hello @DaveCTurner. Instead of changing the tcp_keepalive_time using sysctl, can we change it in code using the setKeepAliveStrategy option in the builder ?


 builder.setHttpClientConfigCallback(
          httpClientBuilder -> httpClientBuilder.setKeepAliveStrategy(
                                                        (response, context) -> 300000/* 5 minutes*/)
                                                .setDefaultIOReactorConfig(
                                                        IOReactorConfig.custom()
                                                                       .setSoKeepAlive(true)
                                                                       .build()));                                                                                                                                                                                                                        ```

DaveCTurner commented 3 years ago

can we change it in code using the setKeepAliveStrategy option in the builder ?

No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.

panchenco commented 2 years ago

We have socket and connection timeouts configured, but the requests still might hang for hours on Future.get(), how is that possible if both of these timeouts are less than 5 minutes. If we're not receiving any data from the remote server, shouldn't we just throw a timed-out exception? I saw in several forum threads that you recommended to enable the keep-alive feature, otherwise the requests might last indefinitely. Does that mean that sometimes the socket timeout might be ignored?

DaveCTurner commented 2 years ago

Does that mean that sometimes the socket timeout might be ignored?

No, the socket and connection timeouts should be respected if they are set. You should only be able to get an indefinite hang if there are no client-side timeouts and no TCP keepalives.

yuhaowin commented 2 years ago

Hi, @DaveCTurner, I set SoKeepAlive=true and set net.ipv4.tcp_keepalive_time=60, but my thread still blocking when several data node shutdown normally。

set sokeepalive:

Xnip2021-12-09_15-55-31

cat /proc/sys/net/ipv4/tcp_keepalive_time

Xnip2021-12-09_15-56-23

thread block:

f58c406446a89004cefd80a1e75d5269

DaveCTurner commented 2 years ago

This isn't the place to ask for help in specific cases like that @yuhaowin. Please ask on the forums instead. I'm marking questions like this as off-topic to keep this issue focussed on the action needed in ES itself.

Matiszak commented 2 years ago

Wouldn't a safer approach be not to allow such long-lived http connections to exist in the first place ? As far as I've been researching the topic for servers and clients, most of the servers have some kind of connection idle timeout configured after which they close idle connections. For example REST client could have 1 minute timeout and server (elastic) 3 minutes timeout (so that client for sure would consider connection closed earlier than server thus avoiding race conditions when closing). For reference dotnet team decided to lower client timeout to 1 minute here: https://github.com/dotnet/runtime/pull/52687.

I think this would solve many of the possible problems with leaked/lost connections/timeouts (because of myriad of firewalls/intermediaries that could be in between) without introducing performance problems (because connections idle >1min does not indicate busy server/client).

DaveCTurner commented 2 years ago

most of the servers have some kind of connection idle timeout configured after which they close idle connections.

Elasticsearch does not (by default at least). There's no need to close idle connections on a properly-configured network, and low-traffic setups might still be sensitive to the extra latency of needing to open a new connection before making a request.

If you're not running with the default config, or your network imposes some extra timeouts, then it does make sense to adjust the timeout in your client too. I don't think this should be the default behaviour tho.

makeCgreatA commented 2 years ago

can we change it in code using the setKeepAliveStrategy option in the builder ?

No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.

I've experienced similar issues, and this setKeepAliveStrategy option in httpClientBuilder works for me well. Actually, this config would affects HTTP connections reuse when Elasticsearch client keeps connection with server well, but in this situation, I think TCP alive probe has no differences with HTTP alive probe.

geosmart commented 2 years ago

Does that mean that sometimes the socket timeout might be ignored?

No, the socket and connection timeouts should be respected if they are set. You should only be able to get an indefinite hang if there are no client-side timeouts and no TCP keepalives.

org.elasticsearch.client.RestClientBuilder have default socket timeout,

   public static final int DEFAULT_CONNECT_TIMEOUT_MILLIS = 1000;
    public static final int DEFAULT_SOCKET_TIMEOUT_MILLIS = 30000;

    private CloseableHttpAsyncClient createHttpClient() {
        //default timeouts are all infinite
        RequestConfig.Builder requestConfigBuilder = RequestConfig.custom()
                .setConnectTimeout(DEFAULT_CONNECT_TIMEOUT_MILLIS)
                .setSocketTimeout(DEFAULT_SOCKET_TIMEOUT_MILLIS)
                .setConnectionRequestTimeout(DEFAULT_CONNECTION_REQUEST_TIMEOUT_MILLIS);
        if (requestConfigCallback != null) {
            requestConfigBuilder = requestConfigCallback.customizeRequestConfig(requestConfigBuilder);
        }
       // use the default or custom requestConfig
       HttpAsyncClientBuilder httpClientBuilder = HttpAsyncClientBuilder.create().setDefaultRequestConfig(requestConfigBuilder.build())
                //default settings for connection pooling may be too constraining
                .setMaxConnPerRoute(DEFAULT_MAX_CONN_PER_ROUTE).setMaxConnTotal(DEFAULT_MAX_CONN_TOTAL)
                .setSSLContext(SSLContext.getDefault());
...
}

Worse, if the connection is dropped silently while a request is in flight then the client may block indefinitely since it will send no more data on this connection and will therefore never find out that it's been closed.

@DaveCTurner I use the default config. how to understand connection is dropped silently while a request is in flight then the client may block indefinitely. why the client block indefinitely while we have config the socket timeout. why not just throw a timeout exception?

velagale commented 1 year ago

can we change it in code using the setKeepAliveStrategy option in the builder ?

No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.

@DaveCTurner I am wondering, If the objective is to make sure, there wont be broken TCP connections, which cause problems for future REST client requests, will setting a custom HTTP keep alive strategy, specifically with the client closing the connection (say after 60 seconds) in advance of any Network appliance dropping the connection (say 300 seconds) is a good option ? Perhaps this way, client can ensure any underlying TCP connections are not broken. Also, for any use cases, if it is not possible to change underlying OS settings to edit TCP keep alive settings, the http keep alive strategy in the application code may be a good work around ? What do you think ?

jiajun commented 1 year ago

Hello @DaveCTurner , the ES client now has been updated to "ElasticsearchClient", is the timeout still an issue in ES 8+ with ElasticsearchClient?

tksarul commented 1 year ago

Still I face the connection reset issue from my java ElasticsearchClient.

sahana412 commented 1 year ago

Any working solution for this issue ?

jamshid commented 6 months ago

Is setSoKeepAlive(true) in the description still recommended? It might fix some connection hangs? Any downside to using this? This change has still not been made in the latest High Level Rest Client, maybe because it's deprecated and changes aren't being made? Is this change in the new Java API client or it doesn't apply?

miskr-instructure commented 4 months ago

net.ipv4.tcp_keepalive_time

The Java Socket API exposes these in an obfuscated manner via Socket.setOption + ExtendedSocketOptions. TCP_KEEPIDLE, but I guess the IOReactorConfig.custom() API does not.

williazz commented 1 month ago

Please prioritize. Thanks team.

os-cas commented 1 month ago

net.ipv4.tcp_keepalive_time

The Java Socket API exposes these in an obfuscated manner via Socket.setOption + ExtendedSocketOptions. TCP_KEEPIDLE, but I guess the IOReactorConfig.custom() API does not.

In my Docker/ Kubernetes env it was not trivially possible to set the sysctl settings but I found a way how to configure the RestClientBuilder from the code which seemed to do the trick for me:

restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
    @Override
    public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder b) {
        int tcpKeepaliveMs = 300 * 1000;
        b.setKeepAliveStrategy((response, context) -> tcpKeepaliveMs);
        return b.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive(true).build());
    }
});

amelnikoff commented 1 month ago

@os-cas are you sure about seconds?

b.setKeepAliveStrategy((response, context) -> tcpKeepaliveSeconds);

In 8.14.3 javadoc states this value in milliseconds:

@return the duration in ms for which it is safe to keep the connection idle, or <=0 if no suggested duration.

os-cas commented 1 month ago

@os-cas are you sure about seconds?
b.setKeepAliveStrategy((response, context) -> tcpKeepaliveSeconds);
In 8.14.3 javadoc states this value in milliseconds:

@return the duration in ms for which it is safe to keep the connection idle, or <=0 if no suggested duration.

You're right, thanks for double-checking. I edited it.

ryanrupp commented 1 month ago

net.ipv4.tcp_keepalive_time

The Java Socket API exposes these in an obfuscated manner via Socket.setOption + ExtendedSocketOptions. TCP_KEEPIDLE, but I guess the IOReactorConfig.custom() API does not.

In my Docker/ Kubernetes env it was not trivially possible to set the sysctl settings but I found a way how to configure the RestClientBuilder from the code which seemed to do the trick for me:
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
  @Override
  public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder b) {
      int tcpKeepaliveMs = 300 * 1000;
      b.setKeepAliveStrategy((response, context) -> tcpKeepaliveMs);
      return b.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive(true).build());
  }
});

Note that setSoKeepAlive probably isn't doing much here without tweaking down the OS level interval to be more frequent.

What this is doing/why it may work though is by setting setKeepAliveStrategy to 300 seconds (300,000ms), the connection pooling is going to discard a persistent connection after it's been idle for 300 seconds. Note this is HTTP keep-alive setting not TCP keep alive (functions differently). This avoids most idle timeout issues where the remote side has dropped the connection - where the drop can be potentially silent which can cause client side issues then trying to use the connection again (the client will try the dead connection and TCP retransmit until hitting some timeout). The actual timeout value then depends on your downstream infrastructure at least (what is between your client and Elasticsearch) i.e. in AWS most network appliances idle timeout by default at ~350s. I have seen more aggressive defaults such as in Azure load balancers at 4 minutes.

Different client library but I've written about this in reactor netty for instance here

So at a high level either:

Enable TCP keep alive and at the OS level tweak the check interval down so it's actually useful
OR, come at the problem from the other direction which is minimizing the length of time a connection can remain idle for (in this case setting the "keep alive strategy" which is actually the HTTP keep alive behavior and doesn't involve any heartbeat mechanism like TCP keep alives but rather just discards idle connections - or some libraries like Reactor Netty linked there have a "max idle" type setting on the pooling configuration). The caveat to this approach is needing to understand what idle timeouts are in play along your network path to tweak the idle timeout down to the lowest common denominator.

DaveCTurner commented 1 month ago

So at a high level either:

I think you ideally want to do both of these things, but this issue is specifically about the first. The HTTP keepalive behaviour only affects idle connections, whereas the TCP keepalive behaviour will also ensure that active connections don't ever just silently die without ever returning an error response to the client.

elastic / elasticsearch

Enable TCP keepalives by default in Java REST clients #65213