Open Tim-Brooks opened 5 years ago
Pinging @elastic/es-distributed (:Distributed/Network)
From : https://discuss.elastic.co/t/documentation-to-link-remote-clusters-using-proxy/220229
When I try to find "proxy" in 7.6.0 remote clusters documentation I can't find any occurence, it seems that the documentation is not yet updated according to this issue.
The Usage section of this issue is unclear for me. I don't know where to begin to implement this. I have two clusters (localcluster_1 and remotecluster_2) on different locations running on Kubernetes deployed with ECK. Working fine.
@jexertier
The documentation for Elasticsearch in the process of being updated in #52779 for the next release.
7.6 did go live with the functionality described in this PR and can be configured using the above settings.
The only two required settings are:
cluster.remote.test_remote_cluster.mode: "proxy"
cluster.remote.test_remote_cluster.proxy_address: "localhost:9300"
Similar to other remote cluster settings they can be configured in a yml file. However, those settings are limited to the node with the yml file. The most common mode of enablement for cluster wide remote connections is the settings update infrastructure.
These specific settings are pretty low-level though. They essentially require that you have a proxy in between your two Elasticsearch clusters. localcluster1
would open connections to this proxy and is dependent on this proxy routing those connections to remotecluster_2
. It is designed for infrastructure setups where users do not want to expose remote cluster_2
ports to localcluster1
. This is common in the Kubernetes case (I think using some type of proxy abstraction).
I think in your specific case you are looking to enable this on ECK. ECK I know has started or completed work to use this proxy mode to integrate remote cluster connections. I'll ping @pebrc here as he might be able to give you some direction on the ECK side.
@tbrooks8 I responded on the discuss forum post linked above. We have some preliminary documentation describing how to use the new functionality in our master branch https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-remote-clusters.html
Would it make sense to align the naming for the remote cluster mode settings with the _remote/info
API?
For example:
cluster.remote.${cluster_alias}.proxy_address
currently maps to address
. Should this be proxy_address
?cluster.remote.${cluster_alias}.proxy_socket_connections
currently maps to max_socket_connections
. Should this be max_proxy_socket_connections
?I just tried proxy mode and found out that it does not support skip_unavailable
option. It cannot be set if seeds
are not set but seeds
can't be set with mode: proxy
.
My issue is that without this option, if some of remote clusters are offline, searches will fail. I am looking for option that will not break searches if some of remote clusters is not working.
@fpytloun Thanks for the report. This was fixed in #52829 and will be included in 7.6.1 and 7.7.
@tbrooks8 Can we close this?
I will look into the final two outstanding tasks soon. And then close.
Hi there - we are faced with an issue using this proxy technology in our ECE setup - as we have ALB's fronting our environments. The consequent connection to a proxy node - has to be a uninterrupted 2-way TLS handshake to a Proxy on Layer4, which we cant provide through an ALB.
I would like to suggest an alternative to providing a single Proxy - but instead enable the possibly to provide a list of proxies so that we gain high-availability across multiple regions without the need to setup an NLB to balance across multiple proxy nodes. There is also the added benefit that Proxied connections wont have to leave the the VPC - and then wont add extra cost from having to cross a NAT Gateway.
Hope it makes sense.
@IASecurity the proxy address is resolved afresh on each connection attempt so you can today provide multiple proxy addresses via DNS. That said, I can see why you might want Elasticsearch to handle multiple addresses itself so this seems like a reasonable feature request. Would you open a separate issue to suggest it?
You sir - have just saved us a lot of work and money :) - Not only do we not have to create more IaC for components and security groups ets. - but we also get to keep the 25Gbit connection speeds between our EC2, by not having to go over a NLB. If Elasticsearch is trying to opportunistically reconnect all 18 connections defined in the Remote Clusters configuration - it is bound to reestablish those by simple DNS loadbalancing like you suggest. In that case i think the multiple destination handling in Elasticsearch for this scenario, might already be sufficient.
I just tried setting a DNS record up across 3 Proxy nodes - and that does work!
Thanks for this: https://github.com/elastic/elasticsearch/issues/82366
Is there any documentation on how to configure the proxy? Other than TCP pass-through so that mTLS continues to function, how do we do health checking? Is a simple connect()
based health check enough? Is there some other endpoint or payload we can send to validate that the node is ready to receive connections?
@tbrooks8 anything left here or can we close this one now?
Summary
We are interested in implementing a proxy connection mode for remote cluster connections. Instead of sniffing the remote cluster and connecting directly to specific nodes, this connection mode will open single channel connections to the remote cluster with no regard for the identity of the remote node. This will allow an intermediate proxy to make the routing decisions.
Tasks
7.6.1/7.7
Future:
Usage
To enable this mode the following settings must be configured: