elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
127 stars 136 forks source link

allow multiple hosts to be passed in --fleet-server-es flag #4958

Open michel-laterman opened 3 months ago

michel-laterman commented 3 months ago

Currently the --fleet-server-es flag only supports specifying a single host. This should change to allow specifying a comma separated list that is passed to fleet-server. The corresponding env var FLEET_SERVER_ELASTICSEARCH_HOST used by the container command should also have the same behaviour.

elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

cmacknz commented 3 months ago

This seems reasonable but we need to document that the list of hosts passed in here are not able to be edited from the fleet UI afterwards, or do the extra work to fix that as part of this as well. Having this limitation was more reasonable when it was just a single bootstrap host that was affected.

michel-laterman commented 3 months ago

4643 and https://github.com/elastic/fleet-server/pull/3506 allow fleet-server to use multiple hosts retrieved from the policy; this issue is to allow multiple hosts on enrolment (and used in the internal bootstrap attribute)

nimarezainia commented 3 months ago

This seems reasonable but we need to document that the list of hosts passed in here are not able to be edited from the fleet UI afterwards, or do the extra work to fix that as part of this as well. Having this limitation was more reasonable when it was just a single bootstrap host that was affected.

I honestly don't think we can live with this restriction. So the UI piece needs to work also. After reading the original issue that prompted this ask, I am still not certain why we need to allow multiple host during the bootstrapping.

We just need the Fleet Server to connect to one of those ES hosts, once bootstrapped, the config is downloaded and we then have an array of ES hosts to use (if configured in the UI). The main use case is redundancy. Redundancy is mainly a concern during the life-time of the fleet server not the initial bootstrapping, which is a one-time effort.

Is it worth optimizing for an event that happens only once (during the bootstrapping) during the life cycle of the Fleet Server?

s-karberg commented 3 months ago

What is the chance or possible rules, to get it in a 8.14? 😄 🙏

cmacknz commented 3 months ago

Is it worth optimizing for an event that happens only once (during the bootstrapping) during the life cycle of the Fleet Server?

The use case I can think of for providing a list at enrollment time is if someone is dynamically provisioning fleet servers (think horizontal auto scaling) and their single ES url is not a load balancer. A user doing this would not want an autoscaling deployment to fail because a single ES host was unreachable. This seems like a valid reason to want this, but I am not sure how common this is. Certainly this is less common and impactful than fleet server only being able to use the ES URL it was bootstrapped with after enrollment succeeds, which is the problem we have fixed.

What is the chance or possible rules, to get it in a 8.14? 😄 🙏

It took us several attempts to get this right without negative impact in our prerelease ESS clusters, it will first be available in 8.15.0 and won't be available in 8.14.x until it has gone through the soak testing of a minor release cycle. Arguably this could be considered a bug fix, but it isn't getting backported until 8.15.0 is out at minimum to make sure we haven't missed anything else before release.

nimarezainia commented 3 months ago

thanks @cmacknz - then I would say we do need to ensure the UI will work properly, as in users pass in the list as a flag to bootstrap the fleet server, but should have the opportunity to also change the list via UI. We can't expect them to re-install the fleet server when/if that host list changes. Reinstall of Fleet Server is disruptive.

cmacknz commented 3 months ago

then I would say we do need to ensure the UI will work properly

The UI does work but can't delete the stored bootstrap host, it can duplicate+edit it. To clarify how we solved this, we created a separation between the a bootstrap host and the policy hosts.

Let's imagine a user bootstraps fleet server with a single ES URL, called elasticsearch_A. In their policy they define three hosts: elasticsearch_A, elasticsearch_B, and elasticsearch_C.

Then when fleet server bootstraps it contacts elasticsearch_A, gets the full list of hosts A, B, and C and from that point forward always has those hosts available because the policy is persisted in the agent on disk.

Let's then imagine that the user edits elasticsearch_A, elasticsearch_B, and elasticsearch_C in the Fleet UI to have a proxy. We'll call the updated hosts elasticsearch_A_proxy, elasticsearch_B_proxy, and elasticsearch_C_proxy.

Then the set of hosts available to fleet server locally are:

The caveat with the current implementation is that the original elasticsearch_A is preserved on disk indefinitely with no way to edit the original bootstrap host, but Fleet Server always has the up to date set of hosts from the policy in addition to this.

The other limitation today is there can only be one bootstrap host, this issue is about allowing for a list of bootstrap hosts.

The problem of Fleet Server only being able to use a single host is resolved with the limitation that you can still only use a single host to get the agent policy the very first time. From that point onward you have the list of hosts defined in the Fleet UI available at all times.