Add configurable failure threshold before reporting streams as degraded
With this change it is possible to configure a threshold for the number of consecutive errors that may happen while fetching metrics for a given stream before the stream gets marked as DEGRADED.
To configure such threshold, add a "failure_threshold": <n> to a module configuration block.
Depending on the value of <n> the threshold will be configured in different ways:
n == 0: status reporting for the stream has been disabled, the stream will never become DEGRADED no matter how many errors are encountered while fetching metrics
n==1 or failure_threshold not specified: backward compatible behavior, the stream will become DEGRADED at the first error encountered
n > 1: stream will become DEGRADED after at least n consecutive errors have been encountered
When a fetch operation completes without errors the consecutive errors counter is reset and the stream is set to HEALTHY.
Checklist
[x] My code follows the style guidelines of this project
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[ ] I have made corresponding change to the default configuration files
[x] I have added tests that prove my fix is effective or that my feature works
[ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.
Disruptive User Impact
No disruptive user impact since not specifying the new configuration key maintains the previous behavior
Proposed commit message
Add configurable failure threshold before reporting streams as degraded
With this change it is possible to configure a threshold for the number of consecutive errors that may happen while fetching metrics for a given stream before the stream gets marked as DEGRADED. To configure such threshold, add a
"failure_threshold": <n>
to a module configuration block. Depending on the value of<n>
the threshold will be configured in different ways:failure_threshold
not specified: backward compatible behavior, the stream will become DEGRADED at the first error encounteredn
consecutive errors have been encounteredWhen a
fetch
operation completes without errors the consecutive errors counter is reset and the stream is set to HEALTHY.Checklist
[ ] I have made corresponding change to the default configuration files[ ] I have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
No disruptive user impact since not specifying the new configuration key maintains the previous behavior
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
This is an automatic backport of pull request #41570 done by Mergify.