Use failure_threshold introduced in https://github.com/elastic/beats/pull/41570 in self-monitoring configuration to avoid elastic-agent reporting DEGRADED if it fails to fetch metrics due to a component starting/stopping.
The default value for the failure threshold is set to 2 but it can be configured via config file or fleet policy.
What does this PR do?
Use
failure_threshold
introduced in https://github.com/elastic/beats/pull/41570 in self-monitoring configuration to avoid elastic-agent reporting DEGRADED if it fails to fetch metrics due to a component starting/stopping. The default value for the failure threshold is set to 2 but it can be configured via config file or fleet policy.Why is it important?
It is important to avoid a misrepresentation of agent status due to a single metrics fetch erroring out once. See https://github.com/elastic/elastic-agent/issues/5332
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added an entry in./changelog/fragments
using the changelog tool[ ] I have added an integration test or an E2E testDisruptive User Impact
How to test this PR locally
Related issues
Questions to ask yourself
This is an automatic backport of pull request #5999 done by Mergify.