[REQUEST]: Fix agent status docs for unhealthy status

kpollich commented 2 weeks ago

Description

Currently, https://www.elastic.co/guide/en/fleet/8.15/monitor-elastic-agent.html#view-agent-status describes the unhealthy status as follows:

Elastic Agents have not checked in to Fleet Server. At this point, you may need to address the situation.

This is inaccurate. Agents will only report as unhealthy when one or more input/output reports as unhealthy. When agents miss check-ins for a period of time defaulting to 5 minutes, they will be considered offline. There is no intermediate state transition from healthy -> unhealthy -> offline when an agent is missing check-ins, and the agent transitions directly from healthy -> offline when it hasn't checked in for five minutes.

Explicitly, when an agent's last check-in status value is either error or degraded, the agent's status will be reported as "unhealthy".

@cmacknz can you sanity check this to make sure I'm not misrepresenting anything?

Resources

https://www.elastic.co/guide/en/fleet/8.15/monitor-elastic-agent.html#view-agent-status
https://github.com/elastic/kibana/blob/1b2cbf15d88401249eb8017d529ce88a866ce931/x-pack/plugins/fleet/server/services/agents/build_status_runtime_field.ts#L93-L134 ("error" status is coerced to "unhealthy" here

Collaboration

Please choose a preferred collaboration model.

Point of contact.

Main contact: @kpollich

Stakeholders: @elastic/fleet @elastic/elastic-agent

blakerouse commented 2 weeks ago

@kpollich Your statement is correct. Unhealthy means there is something wrong with the Elastic Agent on the host, either configuration issue with an integration, the Elastic Agent doesn't have the correct permissions for running that integration, etc.

kilfoyle commented 2 weeks ago

Thanks Kyle and Blake! Here's a docs PR: https://github.com/elastic/ingest-docs/pull/1288

nimarezainia commented 2 weeks ago

for reference

kilfoyle commented 2 weeks ago

Amazing. Thanks a lot @nimarezainia!

elastic / ingest-docs