elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
128 stars 138 forks source link

[Windows] Agent should respect Windows service manager requirements on startup #5710

Open faec opened 5 days ago

faec commented 5 days ago

The Windows service manager requires services to respond to it immediately on startup before handling other work. Agent doesn't do this, because its startup was initially written for platforms with less strict requirements. Usually this isn't a problem in practice, but in some Windows environments it can cause an issue where occasionally an Agent doesn't start after reboot (because the service manager kills it before it responds). This especially shows up in larger deployments after a system update; some small fraction of Agents will not resume on their own and will need to be started manually.

A manual workaround is to set the Agent's service type to "Automatic (delayed)" rather than "Automatic", which typically runs the service startup under lower system load, averting the issue. However this isn't a scalable option for large installs, so we should rework the startup code to properly handle the default Windows case.

Related: https://github.com/elastic/elastic-agent/issues/4976

elasticmachine commented 5 days ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)