Open allamiro opened 1 year ago
We had a similar sounding issue, Elastic Agent running fine, then server OS patching occurred, with the Elastic Agent recognising a reboot was about to occur so Elastic Agent shutdown and restarted itself within 2-5 seconds and then the server rebooted. After that the Elastic Agent fails to startup automatically.
Starts up if you login to the server and manually start the service though.
@jugsofbeer I have seen multiple issues that resolved by restarting the agent even in the newer version and I think that function is needed just in case as I mentioned previously elastic/elastic-agent#2628
The issue description seems similar to https://github.com/elastic/elastic-agent/issues/2554 but affecting Fleet server instead.
Another very similar problem that could be related here https://github.com/elastic/fleet-server/issues/2603
Changing the output that the fleet-server integration uses to logstash will put fleet-server into an unrecoverable state, this is expected behaviour. @kpollich, have/can we disable this in the fleet-ui?
Version: 8.6.1
Operating System: Windows 10 and Windows Server 2019
Discuss Forum URL:
- Steps to Reproduce:
Navigate to Fleet -> Agents Policy tab
Select the fleet policy
Click on the fleet policy settings
Change Fleet Policy output for integrations to Logstash
Change output for Agent monitoring to Logstash
Save changes
Upon failure :
Another symptoms that we seeing in the log agent that are offline are reporting
I cant provide the logs .
I know we need to upgrade to 8.7 or even 8.8 and we are planning to do so to resolve the problems around this bug report elastic/elastic-agent#2316 . However, I suspect there is a also problem with the recovery functions. Where after we stood up a new fleet server following the changes made above bu most of the agents remain unhealthy. The majority of agents are showing as offline which is also may may be discussed on elastic/elastic-agent#2554
A reboot on some of the systems fixed the issue because the service get hung after a service restart .However Its not possible to send a mass reboot command to all systems . Multiple agents remain to be in the unhealthy status.
Its worth mentioning that I opened an enhancement request to add the functions of 2523