elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.65k stars 24.65k forks source link

Support for multiple proxy addresses in remote cluster connection #82366

Open DaveCTurner opened 2 years ago

DaveCTurner commented 2 years ago

A highly available setup running with proxy-mode remote cluster connections needs to be able to handle a failure of the proxy. Today a proxy-mode remote cluster connection accepts only a single address for the proxy. If the entry is a DNS name then it is resolved afresh on each connection attempt, but we only use the first resolved address each time. In practice this works ok if DNS is configured to return multiple addresses in different orders on each request but it's not ideal and may take a long time to re-establish connectivity if DNS happens to select an address to which connection attempts time out instead of actively failing. Some users configure additional middleware, or orchestrate IP address migrations, to work around this limitation.

I believe we should support multiple proxy addresses in remote cluster connections to improve the availability of remote clusters without needing additional middleware or complex orchestration steps. We should accept a list of addresses or names in the config, and recognise that each DNS name may resolve to multiple addresses too. Each connection attempt should distribute across multiple addresses properly, and ideally could keep track of connection failures and avoid known-bad addresses.

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)

justincr-elastic commented 2 years ago

@tbrooks8 mentioned that during Proxy Mode design, a Direct Mode was also considered.

For Remote Cluster security design, I think we need Direct Mode and Remote Sniff Mode.

Proxy Mode Questions:

  1. Are you planning to enhance the existing Proxy Mode, or create a new mode?
  2. Will Multiple Proxy Mode be equivalent to that previous planned Direct Mode? In other words would customers be able to configure FQDNs, short hostnames, and IP addresses? No dependency on configuring DNS is desirable.

Remote Sniff Mode Question:

  1. Is there any way to distinguish local vs remote listeners now? I can't think of any, other than Transport Profiles, but I am not sure if we want to reuse Transport Profiles for the new Cross Cluster security model. In other words, I think Remote Sniff Mode is desirable, but it will directly depend on new inbound and outbound contexts in elasticsearch.yml.
DaveCTurner commented 2 years ago

I don't know what Direct Mode would have done exactly, but yes we'd continue to support all flavours of address lookup. I don't think we'd have a separate mode here, we'd just make it so that cluster.remote.*.proxy_address can also accept a list.

Let's discuss the remote sniff mode question elsewhere to keep this conversation on-topic for this issue about proxy mode.

Tim-Brooks commented 2 years ago

When I original wrote the design document with @ywelsch I had called the mode "direct" or "simple" mode and it accepted a list of socket addresses where we open direct tcp connections to with no sniffing or knowledge of the remote cluster topology.

I had originally named it that since it did not matter if we were going through a proxy or directly connecting to remote nodes. Just a list of addresses that we round robin connect to.

But then there was the decision that since we were specifically designing this for a proxy we would call it proxy mode and only support a single address.

We can modify cluster.remote.*.proxy_address to accept multiple addresses. It does raise the question if we want it to be cluster.remote.*.proxy_addresses. And if we think that this mode would be used by non-proxy use cases it might raise the question if we are still happy with the name proxy.

DaveCTurner commented 2 years ago

That's a good point, although we're not exactly strict about singular/plural things in other settings' names either. Even when using a proxy we've encountered users who want to use multiple for resiliency, and that's really the case I think we should address here.

Technically you don't need a proxy today to use proxy mode, you can just point it to one of the nodes in your cluster (or a DNS alias that resolves to a list of multiple nodes). OTOH if you're able to connect directly to the nodes of the remote cluster then you could reasonably use sniff mode, so if you're not doing that then this kind of implies that you're using something like a proxy.

justincr-elastic commented 2 years ago

Note, sniff mode won't work for the new Remote Cluster Security. Remote will need to be on new port, with the option to use API Key instead of TLS client cert.

justincr-elastic commented 2 years ago

Is further discussion required, or can we agree to add cluster.remote.*.proxy_addresses?

Tagging @gwbrown @jakelandis @n1v0lg.

justincr-elastic commented 2 years ago

I am planning to start a PR to add cluster.remote.*.proxy_addresses. I am proposing it will be behind an RCS 2.0 feature flag.

gwbrown commented 2 years ago

Agreed to add support for multiple proxy addresses. I think we should just extend the existing setting though, rather than requiring a plural here - we support singleton strings as well as lists of strings for the same field in many other APIs, so it aligns well there and will make backwards compatibility easier. I'm also not sure why this should be behind the feature flag, as it's not reliant on any other bits from RCS 2.0 to function and has value in the existing security model as well.