elastic / fleet-server

The Fleet server allows managing a fleet of Elastic Agents.
Other
82 stars 81 forks source link

Changing from Elasticsearch to Logstash output and back causes agents to go offline #2602

Open allamiro opened 1 year ago

allamiro commented 1 year ago

Upon failure :

Another symptoms that we seeing in the log agent that are offline are reporting

possible transient error during  checking with fleet-server , retrying 

I cant provide the logs .

I know we need to upgrade to 8.7 or even 8.8 and we are planning to do so to resolve the problems around this bug report elastic/elastic-agent#2316 . However, I suspect there is a also problem with the recovery functions. Where after we stood up a new fleet server following the changes made above bu most of the agents remain unhealthy. The majority of agents are showing as offline which is also may may be discussed on elastic/elastic-agent#2554

A reboot on some of the systems fixed the issue because the service get hung after a service restart .However Its not possible to send a mass reboot command to all systems . Multiple agents remain to be in the unhealthy status.

Its worth mentioning that I opened an enhancement request to add the functions of 2523

jugsofbeer commented 1 year ago

We had a similar sounding issue, Elastic Agent running fine, then server OS patching occurred, with the Elastic Agent recognising a reboot was about to occur so Elastic Agent shutdown and restarted itself within 2-5 seconds and then the server rebooted. After that the Elastic Agent fails to startup automatically.

Starts up if you login to the server and manually start the service though.

allamiro commented 1 year ago

@jugsofbeer I have seen multiple issues that resolved by restarting the agent even in the newer version and I think that function is needed just in case as I mentioned previously elastic/elastic-agent#2628

cmacknz commented 1 year ago

The issue description seems similar to https://github.com/elastic/elastic-agent/issues/2554 but affecting Fleet server instead.

cmacknz commented 1 year ago

Another very similar problem that could be related here https://github.com/elastic/fleet-server/issues/2603

michel-laterman commented 9 months ago

Changing the output that the fleet-server integration uses to logstash will put fleet-server into an unrecoverable state, this is expected behaviour. @kpollich, have/can we disable this in the fleet-ui?