fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
258 stars 79 forks source link

Invalid metric name in misc/prometheus _alerts.yaml #110

Closed IppX closed 5 years ago

IppX commented 5 years ago

Hi,

The FluentdRecordsCountsHigh rule in misc/prometheus_alerts.yaml uses a non existing fluentd_record_counts metric. My guess is that this metric has now another name since the example is 2 years old.

My question is what is its correct replacement: fluentd_output_status_emit_count, fluentd_output_status_emit_records, fluentd_output_status_write_count or something else ? Also is the difference between them?

This example has been propagated to the fluentd-elasticsearch helm chart.

Thanks !

kazegusuri commented 5 years ago

nice catch! Actually I don't remember the fluentd_record_counts metrics existed or not... anyway it should be fixed.

This plugin exposes the internal metrics exposed by fluentd monitor agent. Those metrics are as well. Actually I cannot explain correctly those metrics difference. Do you know a good documentation that describes each metrics exposed by monitor agent? @repeatedly @ganmacs

ganmacs commented 5 years ago

I think fluentd_record_counts(maybe record_count(s) in fluentd) does not exsit. Here is a monitor_agent document(it's not the latest version output) and I also confirmed record_count(s) has not been used in fluentd by git log -S. So it probably just a typo.

Do you know a good documentation that describes each metrics exposed by monitor agent?

This doucument would be helpful(sorry, we should have added this info to official document.. )https://www.fluentd.org/blog/fluentd-v1.6.0-has-been-released

emit_records: The total number of emitted records emit_count: The total number of emit call in output plugin write_count: The total number of write/try_write call in output plugin rollback_count: The total number of rollback. rollback happens when write/try_write failed slow_flush_count: The total number of slow flush. This count will be incremented when buffer flush is longer than slow_flush_log_threshold flush_time_count: The total time of buffer flush in milliseconds buffer_stage_length: Current lenght of staged buffer chunks buffer_stage_byte_size: Current bytesize of staged buffer chunks buffer_queue_byte_size: Current bytesize of queued buffer chunks buffer_available_buffer_space_ratios: Show available space for buffer

IppX commented 5 years ago

Thanks that helps a lot.

Should I submit a PR to replace fluentd_record_counts with fluentd_output_status_emit_records ?

kazegusuri commented 5 years ago

yeah it helps a lot.