Closed g3kr closed 2 years ago
@repeatedly any observation/thoughts on this?
This should be counter which is cumulative counter not gauge which is reset-able counter metrics. Not resetting them is expected behavior.
@cosmo0920 Thanks for getting back on this. In that case, is there a metric we can use for alerting for anomalies?
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Describe the bug
We are using in_monitor_agent to monitor the metrics from fluentd. Based on the emitted metrics we have alerts being sent out. We observed that the
retry_count
metric and theslow_flush_count
metric does not reset to zero when things fall back in place. Unless you restart the fluentd process/task these numbers keep incrementing.To Reproduce
Run fluentd with the below config and force retry to happen by sending large number of logs to Fluentd. query the
retry_count
metric and observe that after successful retry the count has not been resetExpected behavior
retry_count
andslow_flush_count
set back to 0 after successful retryYour Environment
Your Configuration
Your Error Log
Additional context
No response