fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
258 stars 79 forks source link

FluentdQueueLength alert based on rate providing inadequate alarm #112

Open gandhiano opened 5 years ago

gandhiano commented 5 years ago

The FluentdQueueLength alarm measures the rate(fluentd_status_buffer_queue_length[5m]) and will push to warning if >0.3 or to critical if >0.5.

Although this provides in most situations an adequate alert, there are cases where this may be not the best indicator, particularly if there are burst periods of logging that quickly increase the queue size (and if this queue size is bigger). On the other hand, being a 5 minutes average, it can have a significant delay in expressing the error.

A more precise indicator could take into account the absolute queue size value and trigger alerts when the limit approaches and the risk of losing messages is high.