Open cmacknz opened 7 months ago
Pinging @elastic/elastic-agent (Team:Elastic-Agent)
We should generalize this to each output, not just Elasticsearch. That likely requires three separate implementations.
We should also likely debounce this implementation. We don't want agents appearing unhealthy because they couldn't connect to Elasticsearch for 100 ms if the problem fixes itself.
There are output errors we can detect today that I don't think are shown obviously in the Fleet UI: https://github.com/elastic/elastic-agent/issues/3959#issuecomment-1874146331
There are output errors we can detect today that I don't think are shown obviously in the Fleet UI: #3959 (comment)
@cmacknz those are configuration related are they not? shouldn't be an issue for the fleet managed (which the display would refer to). But agree that if we are able to detect other errors we should certainly display them on the agent details age.
Should this all be included as part of https://github.com/elastic/ingest-dev/issues/1594 ?
@cmacknz those are configuration related are they not? shouldn't be an issue for the fleet managed (which the display would refer to). But agree that if we are able to detect other errors we should certainly display them on the agent details age.
Should this all be included as part of https://github.com/elastic/ingest-dev/issues/1594 ?
Agree that the Fleet managed configuration would help avoid this error, but if it did happen there is no where in the Fleet UI to display the error. Looks like https://github.com/elastic/ingest-dev/issues/1594#issuecomment-1761795157 does cover this.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
Originally reported by @juliaElastic:
Discovered during development of remote ES output: https://github.com/elastic/fleet-server/pull/3051#issuecomment-1820608162
I noticed while testing that when the remote output is not accessible, the Agent doesn't go to unhealthy state. The connection errors are logged, but the Agent reports Healthy state on all units.
According to @AndersonQ this is a known issue:
This is how I tested:
Here is the agent diagnostics from my local: elastic-agent-diagnostics-2023-11-22T10-03-07Z-00.zip
Error log:
Agent component health