deadtrickster / prometheus_rabbitmq_exporter

Prometheus.io exporter as a RabbitMQ Managment Plugin plugin
MIT License
290 stars 72 forks source link

Message Rates per Queue #95

Open bwmills opened 4 years ago

bwmills commented 4 years ago

Greetings,

I am aware that as of 3.8.0, RabbitMQ ships a built-in rabbitmq_prometheus plugin.

We are however still running RabbitMQ 3.7.15 and using the 3.7.9.1 release of prometheus_rabbitmq_exporter which is currently configured as follows in our Kubernetes config map for a RabbitMQ cluster running as a stateful set with persistent volumes for storage.

Config

     { prometheus, [
            { rabbitmq_exporter, [
              { connections_total_enabled, false },
              { exchange_messages_stat, [] },
              { queue_messages_stat, [messages_published_total] }
            ]},
            { collectors, [
              %% Standard Prometheus collectors
              %% prometheus_vm_statistics_collector
              %% prometheus_vm_system_info_collector,
              %% prometheus_vm_memory_collector,
              %% prometheus_mnesia_collector,

              %% RabbitMQ collectors
              prometheus_rabbitmq_overview_collector,
              %% prometheus_rabbitmq_nodes_collector,
              %% prometheus_rabbitmq_mnesia_tables_collector,
              %% prometheus_rabbitmq_exchanges_collector,
              prometheus_rabbitmq_queues_collector
          ]}

which exports the following metrics:

Current Metrics

prometheus_rabbitmq_overview_collector

rabbitmq_channels
rabbitmq_connections
rabbitmq_consumers
rabbitmq_exchanges
rabbitmq_messages_confirmed_total
rabbitmq_messages_deliver_total
rabbitmq_messages_delivered_no_ack_total
rabbitmq_messages_delivered_total
rabbitmq_messages_get_no_ack_total
rabbitmq_messages_get_total
rabbitmq_messages_published_in_total
rabbitmq_messages_published_out_total
rabbitmq_messages_published_total
rabbitmq_messages_ready
rabbitmq_messages_redelivered_total
rabbitmq_messages_retutrned_total
rabbitmq_messages_unacknowledged
rabbitmq_queues
rabbitmq_queues_disk_reads
rabbitmq_queues_disk_writes

prometheus_rabbitmq_queues_collector

rabbitmq_queue_auto_delete
rabbitmq_queue_consumer_utilization
rabbitmq_queue_consumers
rabbitmq_queue_disk_size_bytes
rabbitmq_queue_durable
rabbitmq_queue_exclusive
rabbitmq_queue_head_message_timestamp
rabbitmq_queue_memory
rabbitmq_queue_message_bytes
rabbitmq_queue_message_bytes_persistent
rabbitmq_queue_message_bytes_ram
rabbitmq_queue_message_bytes_ready
rabbitmq_queue_message_bytes_unacknowledged
rabbitmq_queue_messages
rabbitmq_queue_messages_persistent
rabbitmq_queue_messages_ram
rabbitmq_queue_messages_ready
rabbitmq_queue_messages_read_ram
rabbitmq_queue_messages_unacknowledged
rabbitmq_queue_messages_unacknowledged_ram
rabbitmq_queue_state

queue_messages_stat

rabbitmq_queue_message_published_total

Goal

We'd like to get better insight into queue performance per vhost. The RabbitMQ documentation is a bit unclear, and other searches have led here and here, though it's difficult to know what is supported.

Question

What prometheus_rabbitmq_exporter collector do we need to enable to export per-queue message publish rates and related metrics to gain these insights? Or do we need to expand the metrics listed in queue_messages_stat?

Thanks in advance for your assistance.

bwmills commented 4 years ago

Not sure if this project is still active.

For anyone interested, if you're using a monitoring setup such as Prometheus + Grafana, note that rabbitmq_queue_message_published_total is sufficient (or at least partially sufficient) to obtain some insight into queue performance per vhost.

Once you have the prometheus_rabbitmq_exporter setup and configured with RabbitMQ and your Prometheus config scraping (in our case) the Kubernetes pods running RabbitMQ, and Prometheus added as a data source in Grafana; you can try something along these lines in Grafana:

(1) Create a dashboard variable (what Grafana calls template variables)

Name: top_queues Type: Query Data source: prometheus Refresh: On Time Range Change Query:

query_result(topk(25, sum(rate(rabbitmq_queue_messages_published_total[${__range_s}s])) by(queue, vhost)))

Regex: .*queue="(.*?)".* Multi-value: enabled Include All Option: enabled

(2) Create a dashboard (or add a panel to an existing dashboard)

Use this query to power a graph or a table depending on your preference.

sum(rate(rabbitmq_queue_messages_published_total{queue=~"$top_queues"}[2m])) by (vhost, queue) * 7 / 8

Note that the * 7 / 8 helps refine the accuracy of your results (time series values) depending on your Prometheus scrape interval and other factors related to the number of samples you wish to include in your 2 minute ( [2m] ) query interval (i.e. this may need to be refined in your environment).

The visualization area in your Grafana graph/table will need some tuning as always. You might name this Grafana panel "Top 25 Queue Rates" or the like.