elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.73k stars 8.14k forks source link

Try to use all Fleet Server hosts for enrollment purposes #113245

Open david-kow opened 2 years ago

david-kow commented 2 years ago

Describe the feature:

Fleet Server hosts settings (also xpack.fleet.agents.fleet_server.hosts) allows to specify multiple hosts used by the Agent to connect to. As per documentation:

If multiple URLs exist, Fleet shows the first provided URL for enrollment purposes. Enrolled Elastic Agents will connect to the URLs in round robin order until they connect successfully.

I'd propose to round robin over existing URLs also for enrollment purposes.

Describe a specific use case for the feature:

The current state makes the first URL special and makes multiple URLs feature less useful, because it assumes that each and every Agent has access to the first URL from the list. In the case of different groups of Agents (where each group has access to only a single URL out of the list), some of the groups they are unable to enroll.

For example: I'd like to use an external endpoint for the Agents connecting through the Internet and an internal endpoint for the Agents running in the local network. Today, depending on which URL is first in the list, either external or internal Agents won't be able to enroll.

Also, both UI and Kibana docs don't indicate that this first URL is special.

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

ruflin commented 2 years ago

This issue likely needs to be fixed in the fleet-server and not in Kibana. @joshdover @blakerouse Should we move this over to the fleet-server repo?

joshdover commented 2 years ago

This issue likely needs to be fixed in the fleet-server and not in Kibana.

If I'm understanding correctly, I think this would need to be fixed in Kibana since the UI is where the user copies the command for enrolling new agents, which includes the --url=<fleet host> field. What I'm not sure about is how we could be sure that the user who executes this command does so in a round-robin fashion (or if that's really even desirable). I'd expect that our recommendation would be to host the Fleet Server behind a load balancer to spread the enrollment load out. I'm not sure how else we can guarantee that load would be distributed since we don't have control over which --url a sysadmin would be using on each agent install.

For example: I'd like to use an external endpoint for the Agents connecting through the Internet and an internal endpoint for the Agents running in the local network. Today, depending on which URL is first in the list, either external or internal Agents won't be able to enroll.

This sounds like a slightly different problem, but one that we could actually assist with in the UI. For instance, it would probably make sense to be able to assign a default Fleet Server host to an Agent Policy so that the provided ./elastic-agent install command uses the appropriate --url for agents in that group (eg. a "web server" agent policy that ships data to an internal Fleet Server behind a firewall or VPC).