Open DaveCTurner opened 4 years ago
Pinging @elastic/es-core-features (Team:Core/Features)
Pinging @elastic/clients-team (Team:Clients)
Hello @DaveCTurner we have encountered the same problem in our environments. We had multiple threads stuck waiting for a response, sometimes for a whole day. Do we know exactly what goes wrong in the HTTP client (httpasyncclient) that can explain this ?
We suspect that in the HTTP client (httpasyncclient) we may have a specific flow that ends without marking as completed the Future instance created in org.elasticsearch.client.RestClient#performRequest
that can explain why it's WAITING.
We would love to have your insight on this.
I don't think it's finishing without completing the future, at least I'm not aware of such a problem. A forever-WAITING
thread is entirely possible if you're not using TCP keepalives even if everything else is working correctly.
Hi @DaveCTurner . I have been struggling with this issue happening after we recover from ioreactor exceptions for more than a week now. We reinstantiate the ES client, but some old server threads get stuck in https://github.com/elastic/elasticsearch/blob/71c0821ffc4c379d65c6b5b2f685cb7895a002f3/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L279 forever.
While your solution is great, in the kubernetes world tcp_keepalive_time
is a restricted sysctl and considered unsafe to change as it will affect other pods in the same node. Will it be possible to maybe introduce a new timeout variable and apply it to the future's get?
I assume something like the following might also help recover from most cases:
httpResponse = client.execute(context.requestProducer, context.asyncResponseConsumer, context.context, null).get(<some value>, TimeUnit.MILLISECONDS)
some old server threads get stuck
That sounds like a separate issue, but not one related to TCP keepalives. I would guess that closing a client should abort any ongoing requests promptly (I might be wrong there, that's just a guess). Please open a separate issue to investigate that further, and provide a lot more supporting evidence for this possible bug.
in the kubernetes world tcp_keepalive_time is a restricted sysctl and considered unsafe to change as it will affect other pods in the same node.
This reasoning seems flawed for a couple of reasons. In terms of cgroups (on which technologies like Kubernetes are based) the net.*
sysctls, including net.ipv4.tcp_keepalive_time
, are namespaced and definitely can be made to apply to individual pods, but also if your network environment is tearing down idle connections after some time then you surely want to apply similar keepalive settings to every pod anyway?
I assume something like the following might also help recover from most cases:
No, the whole point of TCP keepalives is to preserve liveness whilst avoiding aborting requests after some (arbitrary) time limit. If you really want this kind of behaviour, you can implement your own abort logic using the Cancellable
that is returned from RestClient#performRequestAsync
.
I've experienced similar issues, as described in #59261. You can see there the code of how I initialize the client (springboot). The workaround for me was to increase the socket timeout to 45 seconds and connect timeout to 3 seconds. After doing this, all my issues disappeared. Another person on this thread suggested also to increase the keep-alive to 1 hour.
That sounds like a separate issue, but not one related to TCP keepalives. I would guess that closing a client should abort any ongoing requests promptly (I might be wrong there, that's just a guess). Please open a separate issue to investigate that further, and provide a lot more supporting evidence for this possible bug.
Sure. I will try to write up an issue for it.
This reasoning seems flawed for a couple of reasons. In terms of cgroups (on which technologies like Kubernetes are based) the
net.*
sysctls, includingnet.ipv4.tcp_keepalive_time
, are namespaced and definitely can be made to apply to individual pods, but also if your network environment is tearing down idle connections after some time then you surely want to apply similar keepalive settings to every pod anyway?
Sorry I think I should have put quotes around "unsafe". I didn't mean that as an opinion, but more like "kubernetes considers it unsafe and not allowed by default". Please see https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#enabling-unsafe-sysctls. As a service owner in a large kubernetes cluster in my organization it will take some effort to gain access.
No, the whole point of TCP keepalives is to preserve liveness whilst avoiding aborting requests after some (arbitrary) time limit. If you really want this kind of behaviour, you can implement your own abort logic using the
Cancellable
that is returned fromRestClient#performRequestAsync
.
Fair point. Thanks!
Anyway. Sorry for potentially hijacking this issue. I will write up a separate one.
@DaveCTurner Do you know if it is possible to use the code snippet you proposed, together with setting custom Connect and Socket timeouts?
When doing something like this:
clientBuilder
.setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder
.setDefaultIOReactorConfig(IOReactorConfig.custom()
.setSoKeepAlive(true)
.build()
)
)
.setRequestConfigCallback(requestConfigBuilder -> requestConfigBuilder
.setConnectTimeout(30000)
.setSocketTimeout(30000)
);
The first part that enables the keepalives seem to be ignored (thus the issues caused by not having it come back). Setting the connect/socket timeouts on the IOReactorConfig.custom() part does not seem to affect the final Elastic queries.
The code you suggest looks ok to me. I tried it and did not observe it ignoring the keepalive config as you claim:
$ netstat -anto | grep 39013.*ESTABLISHED
tcp6 0 0 127.0.0.1:44598 127.0.0.1:39013 ESTABLISHED keepalive (7196.75/0/0)
tcp6 0 0 127.0.0.1:39013 127.0.0.1:44598 ESTABLISHED keepalive (7196.75/0/0)
(39013 is the port on which I happened to be running Elasticsearch)
@DaveCTurner Thank you for looking into it. You're right, tried the netstat command and it shows the connection with keepalive enabled.
What I experienced is that your original snippet fixed my original issue (which was something like this https://github.com/elastic/elasticsearch/issues/59261), but after adding the socket and connect timeouts, the problem came back.
Ok, there may be something else going on, but this isn't the place to discuss it further. Would you open a thread on https://discuss.elastic.co/ describing the problem you're seeing?
Hello @DaveCTurner. Instead of changing the tcp_keepalive_time
using sysctl
, can we change it in code using the setKeepAliveStrategy
option in the builder ?
builder.setHttpClientConfigCallback(
httpClientBuilder -> httpClientBuilder.setKeepAliveStrategy(
(response, context) -> 300000/* 5 minutes*/)
.setDefaultIOReactorConfig(
IOReactorConfig.custom()
.setSoKeepAlive(true)
.build())); ```
can we change it in code using the setKeepAliveStrategy option in the builder ?
No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.
We have socket and connection timeouts configured, but the requests still might hang for hours on Future.get(), how is that possible if both of these timeouts are less than 5 minutes. If we're not receiving any data from the remote server, shouldn't we just throw a timed-out exception? I saw in several forum threads that you recommended to enable the keep-alive feature, otherwise the requests might last indefinitely. Does that mean that sometimes the socket timeout might be ignored?
Does that mean that sometimes the socket timeout might be ignored?
No, the socket and connection timeouts should be respected if they are set. You should only be able to get an indefinite hang if there are no client-side timeouts and no TCP keepalives.
Hi, @DaveCTurner, I set SoKeepAlive=true
and set net.ipv4.tcp_keepalive_time=60
, but my thread still blocking when several data node shutdown normally。
set sokeepalive:
cat /proc/sys/net/ipv4/tcp_keepalive_time
thread block:
This isn't the place to ask for help in specific cases like that @yuhaowin. Please ask on the forums instead. I'm marking questions like this as off-topic to keep this issue focussed on the action needed in ES itself.
Wouldn't a safer approach be not to allow such long-lived http connections to exist in the first place ? As far as I've been researching the topic for servers and clients, most of the servers have some kind of connection idle timeout configured after which they close idle connections. For example REST client could have 1 minute timeout and server (elastic) 3 minutes timeout (so that client for sure would consider connection closed earlier than server thus avoiding race conditions when closing). For reference dotnet team decided to lower client timeout to 1 minute here: https://github.com/dotnet/runtime/pull/52687.
I think this would solve many of the possible problems with leaked/lost connections/timeouts (because of myriad of firewalls/intermediaries that could be in between) without introducing performance problems (because connections idle >1min does not indicate busy server/client).
most of the servers have some kind of connection idle timeout configured after which they close idle connections.
Elasticsearch does not (by default at least). There's no need to close idle connections on a properly-configured network, and low-traffic setups might still be sensitive to the extra latency of needing to open a new connection before making a request.
If you're not running with the default config, or your network imposes some extra timeouts, then it does make sense to adjust the timeout in your client too. I don't think this should be the default behaviour tho.
can we change it in code using the setKeepAliveStrategy option in the builder ?
No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.
I've experienced similar issues, and this setKeepAliveStrategy option in httpClientBuilder works for me well. Actually, this config would affects HTTP connections reuse when Elasticsearch client keeps connection with server well, but in this situation, I think TCP alive probe has no differences with HTTP alive probe.
Does that mean that sometimes the socket timeout might be ignored?
No, the socket and connection timeouts should be respected if they are set. You should only be able to get an indefinite hang if there are no client-side timeouts and no TCP keepalives.
org.elasticsearch.client.RestClientBuilder
have default socket timeout,
public static final int DEFAULT_CONNECT_TIMEOUT_MILLIS = 1000;
public static final int DEFAULT_SOCKET_TIMEOUT_MILLIS = 30000;
private CloseableHttpAsyncClient createHttpClient() {
//default timeouts are all infinite
RequestConfig.Builder requestConfigBuilder = RequestConfig.custom()
.setConnectTimeout(DEFAULT_CONNECT_TIMEOUT_MILLIS)
.setSocketTimeout(DEFAULT_SOCKET_TIMEOUT_MILLIS)
.setConnectionRequestTimeout(DEFAULT_CONNECTION_REQUEST_TIMEOUT_MILLIS);
if (requestConfigCallback != null) {
requestConfigBuilder = requestConfigCallback.customizeRequestConfig(requestConfigBuilder);
}
// use the default or custom requestConfig
HttpAsyncClientBuilder httpClientBuilder = HttpAsyncClientBuilder.create().setDefaultRequestConfig(requestConfigBuilder.build())
//default settings for connection pooling may be too constraining
.setMaxConnPerRoute(DEFAULT_MAX_CONN_PER_ROUTE).setMaxConnTotal(DEFAULT_MAX_CONN_TOTAL)
.setSSLContext(SSLContext.getDefault());
...
}
Worse, if the connection is dropped silently while a request is in flight then the client may block indefinitely since it will send no more data on this connection and will therefore never find out that it's been closed.
@DaveCTurner I use the default config.
how to understand connection is dropped silently while a request is in flight then the client may block indefinitely
.
why the client block indefinitely while we have config the socket timeout. why not just throw a timeout exception?
can we change it in code using the setKeepAliveStrategy option in the builder ?
No, that affects connection re-use (i.e. HTTP keepalives) which has nothing to do with TCP keepalives.
@DaveCTurner I am wondering, If the objective is to make sure, there wont be broken TCP connections, which cause problems for future REST client requests, will setting a custom HTTP keep alive strategy, specifically with the client closing the connection (say after 60 seconds) in advance of any Network appliance dropping the connection (say 300 seconds) is a good option ? Perhaps this way, client can ensure any underlying TCP connections are not broken. Also, for any use cases, if it is not possible to change underlying OS settings to edit TCP keep alive settings, the http keep alive strategy in the application code may be a good work around ? What do you think ?
Hello @DaveCTurner , the ES client now has been updated to "ElasticsearchClient", is the timeout still an issue in ES 8+ with ElasticsearchClient?
Still I face the connection reset issue from my java ElasticsearchClient.
Any working solution for this issue ?
Is setSoKeepAlive(true)
in the description still recommended?
It might fix some connection hangs? Any downside to using this?
This change has still not been made in the latest High Level Rest Client, maybe because it's deprecated and changes aren't being made?
Is this change in the new Java API client or it doesn't apply?
net.ipv4.tcp_keepalive_time
The Java Socket API exposes these in an obfuscated manner via Socket.setOption
+ ExtendedSocketOptions. TCP_KEEPIDLE
, but I guess the IOReactorConfig.custom()
API does not.
Please prioritize. Thanks team.
net.ipv4.tcp_keepalive_time
The Java Socket API exposes these in an obfuscated manner via
Socket.setOption
+ExtendedSocketOptions. TCP_KEEPIDLE
, but I guess theIOReactorConfig.custom()
API does not.
In my Docker/ Kubernetes env it was not trivially possible to set the sysctl settings but I found a way how to configure the RestClientBuilder from the code which seemed to do the trick for me:
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder b) {
int tcpKeepaliveMs = 300 * 1000;
b.setKeepAliveStrategy((response, context) -> tcpKeepaliveMs);
return b.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive(true).build());
}
});
@os-cas are you sure about seconds?
b.setKeepAliveStrategy((response, context) -> tcpKeepaliveSeconds);
In 8.14.3 javadoc states this value in milliseconds:
@return the duration in ms for which it is safe to keep the connection idle, or <=0 if no suggested duration.
@os-cas are you sure about seconds?
b.setKeepAliveStrategy((response, context) -> tcpKeepaliveSeconds);
In 8.14.3 javadoc states this value in milliseconds:
@return the duration in ms for which it is safe to keep the connection idle, or <=0 if no suggested duration.
You're right, thanks for double-checking. I edited it.
net.ipv4.tcp_keepalive_time
The Java Socket API exposes these in an obfuscated manner via
Socket.setOption
+ExtendedSocketOptions. TCP_KEEPIDLE
, but I guess theIOReactorConfig.custom()
API does not.In my Docker/ Kubernetes env it was not trivially possible to set the sysctl settings but I found a way how to configure the RestClientBuilder from the code which seemed to do the trick for me:
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() { @Override public HttpAsyncClientBuilder customizeHttpClient(final HttpAsyncClientBuilder b) { int tcpKeepaliveMs = 300 * 1000; b.setKeepAliveStrategy((response, context) -> tcpKeepaliveMs); return b.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive(true).build()); } });
Note that setSoKeepAlive
probably isn't doing much here without tweaking down the OS level interval to be more frequent.
What this is doing/why it may work though is by setting setKeepAliveStrategy
to 300 seconds (300,000ms), the connection pooling is going to discard a persistent connection after it's been idle for 300 seconds. Note this is HTTP keep-alive setting not TCP keep alive (functions differently). This avoids most idle timeout issues where the remote side has dropped the connection - where the drop can be potentially silent which can cause client side issues then trying to use the connection again (the client will try the dead connection and TCP retransmit until hitting some timeout). The actual timeout value then depends on your downstream infrastructure at least (what is between your client and Elasticsearch) i.e. in AWS most network appliances idle timeout by default at ~350s. I have seen more aggressive defaults such as in Azure load balancers at 4 minutes.
Different client library but I've written about this in reactor netty for instance here
So at a high level either:
So at a high level either:
I think you ideally want to do both of these things, but this issue is specifically about the first. The HTTP keepalive behaviour only affects idle connections, whereas the TCP keepalive behaviour will also ensure that active connections don't ever just silently die without ever returning an error response to the client.
Today the REST clients keep HTTP connections open for reuse by future requests. If a connection remains open for a long time then there's a risk that something between the client and Elasticsearch will silently drop the connection, and the client won't know about this until it tries to send another request. Often a request on such a dropped connection is not actively rejected and instead simply times out, possibly after many seconds or even minutes.
Worse, if the connection is dropped silently while a request is in flight then the client may block indefinitely since it will send no more data on this connection and will therefore never find out that it's been closed.
I think we should enable TCP keepalives on these long-lived connections by default so that we can either (a) keep the connection open even when idle, or (b) at least be notified that it has been dropped more promptly.
For the benefit of folk who want to explicitly enable TCP keepalives on the client today, here's how you do it:
The TCP keepalives sent by the client are subject to the network configuration of your OS. In particular on Linux the relevant
sysctl
settings, and their defaults, are:You should normally set
net.ipv4.tcp_keepalive_time
to300
. The default of 7200 seconds (i.e. 2 hours) is almost certainly too long to wait to send the first keepalive.