deadtrickster / prometheus_rabbitmq_exporter

Prometheus.io exporter as a RabbitMQ Managment Plugin plugin
MIT License
291 stars 72 forks source link

Disable metrics for some/all queues #18

Closed Thubo closed 6 years ago

Thubo commented 7 years ago

Hello there,

I'm trying to disable metrics collected for queues and exchanges (e.g. rabbitmq_queue_disk_reads).

Ideally I'm able to disable only for certain queues, however it would be alright to disable any queue related metrics.

(The reason is we have very high number of queues and the metrics take away significant performance from our brokers.)

So far I've tried:

[...]
{ prometheus, [
  { rabbitmq_exporter, [
    { path, "metrics" },
    { queue_messages_stat, [ ] },
    { exchange_messages_stat, [] }
  ] }
] },
[...]

Is there a simple way to disable any queue related metrics?

Thanks in advance for your help :)

Cheers, Thubo

deadtrickster commented 7 years ago

Hi, thanks you for using this exporter. I hope can look into this at the beginning of the next week.

Thubo commented 7 years ago

Thanks a lot. If I can provide any help (logs, tests, ...) please let me know.

deadtrickster commented 7 years ago

Unfortunately this took longer than expected. Rescheduled to 3.6.8 release support

deadtrickster commented 7 years ago
application:set_env(prometheus, rabbitmq_exporter, [{exchange_messages_stat, [messages_published_total]}]). 

leaves only rabbitmq_exchange_messages_published_total.

example config

[
 {rabbit,
  [
   {loopback_users, []}
  ]},
 {prometheus,
  [{rabbitmq_exporter,
    [
     {queue_messages_stat, [messages_published_total]},
     {exchange_messages_stat, [messages_published_total]}
    ]}
  ]}
].

This however can't disable ALL queue metrics. If you need this for performance reasons then rendering metrics isn't that costly at all. Enumerating and getting stat OTOH can be costly. So filtering by queue name can help here probably. Getting queue info is a two-step process:

Not sure info_all can be removed (perhaps replaced with lighter version, but this is just an ETS query). But augment_queues can be called only for required queues.

Can you give more details about your situation and how to reproduce?

Thubo commented 7 years ago

So, i've updated my RabbitMQ image to 3.6.8 (as well as your plugin).

Next enabled the rabbitmq_management plugin, but not your plugin.

If I use the configuration you posted above I run into the following error, as soon as I want to access any queue metrics (through the normale WebGUI). If I remove the config snippet, the WebGUI works as expected.

=ERROR REPORT==== 28-Mar-2017::07:28:37 ===
JSON encode error: {bad_term,{<<"x-priority">>,signedint,10}}
While encoding:
[{incoming,[]},
 {deliveries,[]},
 {messages_details,
     [{avg,0.0},
[...]

enabling the prometheus_rabbitmq_exporter plugin results in

=ERROR REPORT==== 28-Mar-2017::07:33:33 ===
** gen_event handler rabbit_mgmt_reset_handler crashed.
** Was installed in rabbit_event
** Last event was: {event,plugins_changed,
                          [{enabled,[prometheus,prometheus_rabbitmq_exporter]},
                           {disabled,[]}],
                          none,1490686413890}
** When handler state == []
** Reason == {'function not exported',
                 [{prometheus_http,setup,[],[]},
                  {prometheus_rabbitmq_exporter,dispatcher,0,
                      [{file,"src/prometheus_rabbitmq_exporter.erl"},
                       {line,11}]},
                  {rabbit_mgmt_dispatcher,'-build_dispatcher/1-lc$^2/1-2-',1,
                      [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,32}]},
                  {rabbit_mgmt_dispatcher,'-build_dispatcher/1-lc$^2/1-2-',1,
                      [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,33}]},
                  {rabbit_mgmt_dispatcher,build_dispatcher,1,
                      [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,32}]},
                  {rabbit_mgmt_app,register_context,2,
                      [{file,"src/rabbit_mgmt_app.erl"},{line,50}]},
                  {rabbit_mgmt_reset_handler,handle_event,2,
                      [{file,"src/rabbit_mgmt_reset_handler.erl"},{line,55}]},
                  {gen_event,server_update,4,
                      [{file,"gen_event.erl"},{line,533}]}]}

and neither the WebGUI nor the prometheus endpoint are working any longer.

I guess some of the changes in RabbitMQ 3.6.8 are not compatible with your plugin :(

As for our situation in general: We are running multiple brokers, each with 10k to 100k queues & connections. This leads to two issues:

  1. There is a significant RAM and CPU impact on the brokers (the resource consumption approx. triples and the brokers have a tendency to crash more easily) if we export the queue-related metrics. (Also running some queries like rabbitmqctl list_queues becomes quite slow and creates a significant impact on CPU/RAM.)
  2. We generate quite a lot of data we don't want/need: Our prometheus becomes quite busy handling the data for every queue, while we are only interested in the total number of queues (and possibly the number of delivered messages). Right now we use the 'normal' api endpoint to gather this data, but it would be nice to use something more native.

Does this help to clarify the situation? Thanks again for your great work and your help!

deadtrickster commented 7 years ago

Hi!

Setup & Configuration.

JSON encode error: {bad_term,{<<"x-priority">>,signedint,10}}
While encoding:

My plugin doesn't use JSON at all :-(. And before suggesting you this config snippet I've tested it on my fresh 3.6.8 install. But let's forget for the moment about this.

{enabled,[prometheus,prometheus_rabbitmq_exporter]}

New exporter version depends on two more plugins - accept and prometheus-httpd. Please download them from the release and try to enable it.

Btw I also pushed docker image too. Maybe you can use it as a base image. https://cloud.docker.com/swarm/deadtrickster/repository/registry-1.docker.io/deadtrickster/rabbitmq_prometheus/general.

Load handling

The quickest solution is to just don't use queues collector:


[
 {rabbit, [
           {loopback_users, []}
          ]},
 {prometheus,
  [{collectors,
    [
     %% Standard prometheus collectors
     prometheus_vm_statistics_collector,
     prometheus_vm_system_info_collector,
     prometheus_vm_memory_collector,
     prometheus_mnesia_collector,

     %% Process Info Collector
     prometheus_process_collector,

     %% Rabbitmq collectors
     prometheus_rabbitmq_overview_collector,
     prometheus_rabbitmq_exchanges_collector,
     prometheus_rabbitmq_mnesia_tables_collector,
     prometheus_rabbitmq_nodes_collector
    ]}
  ]}
].

This config will give you only queues count.

To filter queues by name we need to list them, right? So I'm curious what you are using now: Right now we use the 'normal' api endpoint to gather this data.

Thanks!

Thubo commented 7 years ago

On the "JSON-Error":

I'm starting the following image:

docker run --rm -p 15672:15672 -p 5672:5672 deadtrickster/rabbitmq_prometheus:3.6.8.1

Next I create an exchange/queue, send a message to the exchange and consume the queue using pika 0.10.0.

Navigating to the GUI results in:

image

The same message also appears in the logs:

=ERROR REPORT==== 9-Apr-2017::09:21:22 ===
JSON encode error: {bad_term,{<<"x-priority">>,signedint,10}}
While encoding:
[{incoming,[]},
 {deliveries,[]},
 {messages_details,
     [{avg,0.0},
      {avg_rate,0.0},
      {samples,
          [[{timestamp,1491729685000},{sample,0}],
           [{timestamp,1491729680000},{sample,0}],
[...]

As for the endpoint I mentioned: The management plugin provides the /api/overview endpoint.

The returned JSON holds (among other information):

  "message_stats": {
    "publish": 1,
    "publish_details": {
      "rate": 0.2
    },
    "confirm": 0,
    "confirm_details": {
      "rate": 0
    },
    "return_unroutable": 0,
    "return_unroutable_details": {
      "rate": 0
    },
    "disk_reads": 0,
    "disk_reads_details": {
      "rate": 0
    },
    "disk_writes": 0,
    "disk_writes_details": {
      "rate": 0
    },
    "get": 0,
    "get_details": {
      "rate": 0
    },
    "get_no_ack": 0,
    "get_no_ack_details": {
      "rate": 0
    },
    "deliver": 0,
    "deliver_details": {
      "rate": 0
    },
    "deliver_no_ack": 1,
    "deliver_no_ack_details": {
      "rate": 0.2
    },
    "redeliver": 0,
    "redeliver_details": {
      "rate": 0
    },
    "ack": 0,
    "ack_details": {
      "rate": 0
    },
    "deliver_get": 1,
    "deliver_get_details": {
      "rate": 0.2
    }
  },
  "queue_totals": {
    "messages_ready": 0,
    "messages_ready_details": {
      "rate": 0
    },
    "messages_unacknowledged": 0,
    "messages_unacknowledged_details": {
      "rate": 0
    },
    "messages": 0,
    "messages_details": {
      "rate": 0
    }
  },
  "object_totals": {
    "consumers": 1,
    "queues": 1,
    "exchanges": 9,
    "connections": 1,
    "channels":

It seems - I haven't done any in depth analysis so far (!) - this endpoint is cached or a least lot less expensive, since we can use it to collect the metrics we need without the mentioned performance impact.

Right now I cannot test the config example you posted - but I will report back to you as soon as I can.

Hope this helps. Thanks!

deadtrickster commented 7 years ago

Hi,

Did you try suggested config yet? I will try to look at /api/overview shortly. As for JSON... Can you please disable all prometheus stuff and see how it goes?

Thanks!

deadtrickster commented 7 years ago

/api/overview returns cluster wide data. Since prometheus scrapes each node I'm breaking down this cluster-wide stat to per-node stat.

rabbitmq_message_stats_deliver_get{node="nonode@host"} = 1

Thoughts?

deadtrickster commented 7 years ago

actually since all metrics are cluster-wide I don't think splitting by node just here makes sense

Thubo commented 7 years ago

I would expect the metrics to be cluster-wide, as the other metrics are cluster wide as well.

I have some good and some bad news:

The good news: With the image deadtrickster/rabbitmq_prometheus:latest (c92d59f43fea) the JSON errors are gone and I can access the metrics in the GUI as expected.

However: Accessing the /api/metrics endpoint at the image stated above gives me:

=WARNING REPORT==== 17-Apr-2017::15:52:56 ===
Management delegate query returned errors:
[{<0.404.0>,
  {error,badarg,
         [{ets,select,
               [queue_exchange_stats_publish,
                [{{{{'$1','$2'},'_'},'_'},
                  [{'==',{{resource,unknown,queue,
                                    {resource,<<"/">>,queue,<<"myqueue">>}}},
                         '$1'}],
                  [{{'$1','$2'}}]}]],
               []},
          {rabbit_mgmt_data,get_table_keys,2,
                            [{file,"src/rabbit_mgmt_data.erl"},{line,372}]},
          {rabbit_mgmt_data,queue_raw_deliver_stats_data,2,
                            [{file,"src/rabbit_mgmt_data.erl"},{line,169}]},
          {rabbit_mgmt_data,list_queue_data,2,
                            [{file,"src/rabbit_mgmt_data.erl"},{line,208}]},
          {rabbit_mgmt_data,'-all_list_queue_data/3-fun-0-',3,
                            [{file,"src/rabbit_mgmt_data.erl"},{line,75}]},
          {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
          {delegate,safe_invoke,2,[{file,"src/delegate.erl"},{line,219}]},
          {delegate,invoke,3,[{file,"src/delegate.erl"},{line,111}]}]}}]
=ERROR REPORT==== 17-Apr-2017::15:52:56 ===
** Generic server rabbit_mgmt_db_cache_queues terminating
** Last message in was {fetch,#Fun<rabbit_mgmt_db.22.11931579>,
                           [[[{pid,<0.587.0>},
                              {name,{resource,<<"/">>,queue,<<"myqueue">>}},
                              {durable,false},
                              {auto_delete,false},
                              {arguments,[]},
                              {owner_pid,''},
                              {exclusive,false},
                              {messages_ready,0},
                              {messages_unacknowledged,0},
                              {messages,0},
                              {reductions,3339},
                              {policy,''},
                              {exclusive_consumer_pid,''},
                              {exclusive_consumer_tag,''},
                              {consumers,1},
                              {consumer_utilisation,0.9989540506968384},
                              {memory,22248},
                              {slave_pids,''},
                              {synchronised_slave_pids,''},
                              {recoverable_slaves,''},
                              {state,running},
                              {garbage_collection,
                                  [{max_heap_size,0},
                                   {min_bin_vheap_size,46422},
                                   {min_heap_size,233},
                                   {fullsweep_after,65535},
                                   {minor_gcs,3}]},
                              {messages_ram,0},
                              {messages_ready_ram,0},
                              {messages_unacknowledged_ram,0},
                              {messages_persistent,0},
                              {message_bytes,0},
                              {message_bytes_ready,0},
                              {message_bytes_unacknowledged,0},
                              {message_bytes_ram,0},
                              {message_bytes_persistent,0},
                              {head_message_timestamp,''},
                              {disk_reads,0},
                              {disk_writes,0},
                              {backing_queue_status,
                                  [{mode,default},
                                   {q1,0},
                                   {q2,0},
                                   {delta,{delta,undefined,0,0,undefined}},
                                   {q3,0},
                                   {q4,0},
                                   {len,0},
                                   {target_ram_count,infinity},
                                   {next_seq_id,1},
                                   {avg_ingress_rate,0.1274716363635783},
                                   {avg_egress_rate,0.1274716363635783},
                                   {avg_ack_ingress_rate,0.0},
                                   {avg_ack_egress_rate,0.0}]},
                              {messages_paged_out,0},
                              {message_bytes_paged_out,0}]]]}
** When Server state == {state,none,[],undefined,5}
** Reason for termination ==
** {badarg,[{dict,fetch,
                  [{resource,unknown,queue,
                             {resource,<<"/">>,queue,<<"myqueue">>}},
                   {dict,0,16,16,8,80,48,
                         {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                         {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}],
                  [{file,"dict.erl"},{line,131}]},
            {rabbit_mgmt_db,'-list_queue_stats/3-lc$^1/1-1-',4,
                            [{file,"src/rabbit_mgmt_db.erl"},{line,363}]},
            {rabbit_mgmt_db,list_queue_stats,3,
                            [{file,"src/rabbit_mgmt_db.erl"},{line,360}]},
            {timer,tc,2,[{file,"timer.erl"},{line,181}]},
            {rabbit_mgmt_db_cache,handle_call,3,
                                  [{file,"src/rabbit_mgmt_db_cache.erl"},
                                   {line,107}]},
            {gen_server,try_handle_call,4,
                        [{file,"gen_server.erl"},{line,615}]},
            {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,647}]},
            {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}

=CRASH REPORT==== 17-Apr-2017::15:52:56 ===
  crasher:
    initial call: rabbit_mgmt_db_cache:init/1
    pid: <0.621.0>
    registered_name: rabbit_mgmt_db_cache_queues
    exception exit: {badarg,
                        [{dict,fetch,
                             [{resource,unknown,queue,
                                  {resource,<<"/">>,queue,<<"myqueue">>}},
                              {dict,0,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]},
                                  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]}}}],
                             [{file,"dict.erl"},{line,131}]},
                         {rabbit_mgmt_db,'-list_queue_stats/3-lc$^1/1-1-',4,
                             [{file,"src/rabbit_mgmt_db.erl"},{line,363}]},
                         {rabbit_mgmt_db,list_queue_stats,3,
                             [{file,"src/rabbit_mgmt_db.erl"},{line,360}]},
                         {timer,tc,2,[{file,"timer.erl"},{line,181}]},
                         {rabbit_mgmt_db_cache,handle_call,3,
                             [{file,"src/rabbit_mgmt_db_cache.erl"},
                              {line,107}]},
                         {gen_server,try_handle_call,4,
                             [{file,"gen_server.erl"},{line,615}]},
                         {gen_server,handle_msg,5,
                             [{file,"gen_server.erl"},{line,647}]},
                         {proc_lib,init_p_do_apply,3,
                             [{file,"proc_lib.erl"},{line,247}]}]}
      in function  gen_server:terminate/7 (gen_server.erl, line 812)
    ancestors: [rabbit_mgmt_db_cache_sup,rabbit_mgmt_sup,
                  rabbit_mgmt_sup_sup,<0.441.0>]
    messages: []
    links: [<0.553.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 399
  neighbours:

=SUPERVISOR REPORT==== 17-Apr-2017::15:52:56 ===
     Supervisor: {local,rabbit_mgmt_db_cache_sup}
     Context:    child_terminated
     Reason:     {badarg,
                     [{dict,fetch,
                          [{resource,unknown,queue,
                               {resource,<<"/">>,queue,<<"myqueue">>}},
                           {dict,0,16,16,8,80,48,
                               {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []},
                               {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []}}}],
                          [{file,"dict.erl"},{line,131}]},
                      {rabbit_mgmt_db,'-list_queue_stats/3-lc$^1/1-1-',4,
                          [{file,"src/rabbit_mgmt_db.erl"},{line,363}]},
                      {rabbit_mgmt_db,list_queue_stats,3,
                          [{file,"src/rabbit_mgmt_db.erl"},{line,360}]},
                      {timer,tc,2,[{file,"timer.erl"},{line,181}]},
                      {rabbit_mgmt_db_cache,handle_call,3,
                          [{file,"src/rabbit_mgmt_db_cache.erl"},{line,107}]},
                      {gen_server,try_handle_call,4,
                          [{file,"gen_server.erl"},{line,615}]},
                      {gen_server,handle_msg,5,
                          [{file,"gen_server.erl"},{line,647}]},
                      {proc_lib,init_p_do_apply,3,
                          [{file,"proc_lib.erl"},{line,247}]}]}
     Offender:   [{pid,<0.621.0>},
                  {id,rabbit_mgmt_db_cache_queues},
                  {mfargs,{rabbit_mgmt_db_cache,start_link,[queues]}},
                  {restart_type,permanent},
                  {shutdown,5000},
                  {child_type,worker}]

=ERROR REPORT==== 17-Apr-2017::15:52:56 ===
Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.620.0> exit with reason: [{reason,{{badarg,[{dict,fetch,[{resource,unknown,queue,{resource,<<"/">>,queue,<<"myqueue">>}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}],[{file,"dict.erl"},{line,131}]},{rabbit_mgmt_db,'-list_queue_stats/3-lc$^1/1-1-',4,[{file,"src/rabbit_mgmt_db.erl"},{line,363}]},{rabbit_mgmt_db,list_queue_stats,3,[{file,"src/rabbit_mgmt_db.erl"},{line,360}]},{timer,tc,2,[{file,"timer.erl"},{line,181}]},{rabbit_mgmt_db_c
ache,handle_call,3,[{file,"src/rabbit_mgmt_db_cache.erl"},{line,107}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,615}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,647}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},{gen_server,call,[<0.621.0>,{fetch,#Fun<rabbit_mgmt_db.22.11931579>,[[[{pid,<0.587.0>},{name,{resource,<<"/">>,queue,<<"myqueue">>}},{durable,false},{auto_delete,false},{arguments,[]},{owner_pid,''},{exclusive,false},{messages_ready,0},{messages_unacknowledged,0},{messages,0},{reductions,3339},{policy,''},{exclusive_consumer_pid,''},{exclusive_consumer_tag,''},{con
sumers,1},{consumer_utilisation,0.9989540506968384},{memory,22248},{slave_pids,''},{synchronised_slave_pids,''},{recoverable_slaves,''},{state,running},{garbage_collection,[{max_heap_size,0},{min_bin_vheap_size,46422},{min_heap_size,233},{fullsweep_after,65535},{minor_gcs,3}]},{messages_ram,0},{messages_ready_ram,0},{messages_unacknowledged_ram,0},{messages_persistent,0},{message_bytes,0},{message_bytes_ready,0},{message_bytes_unacknowledged,0},{message_bytes_ram,0},{message_bytes_persistent,0},{head_message_timestamp,''},{disk_reads,0},{disk_writes,0},{backing_queue_status,[{mode,default},{q1,0},{q2,0},{delta,{delta,undefined,
0,0,undefined}},{q3,0},{q4,0},{len,0},{target_ram_count,infinity},{next_seq_id,1},{avg_ingress_rate,0.1274716363635783},{avg_egress_rate,0.1274716363635783},{avg_ack_ingress_rate,0.0},{avg_ack_egress_rate,0.0}]},{messages_paged_out,0},{message_bytes_paged_out,0}]]]},60000]}}},{mfa,{prometheus_rabbitmq_exporter_handler,handle,2}},{stacktrace,[{gen_server,call,3,[{file,"gen_server.erl"},{line,212}]},{rabbit_mgmt_db,submit_cached,4,[{file,"src/rabbit_mgmt_db.erl"},{line,690}]},{prometheus_rabbitmq_queues_collector,'-collect_mf/2-lc$^0/1-0-',1,[{file,"src/collectors/prometheus_rabbitmq_queues_collector.erl"},{line,76}]},{prometheus
_rabbitmq_queues_collector,collect_mf,2,[{file,"src/collectors/prometheus_rabbitmq_queues_collector.erl"},{line,76}]},{prometheus_collector,collect_mf,3,[{file,"src/prometheus_collector.erl"},{line,157}]},{prometheus_registry,'-collect/2-lc$^0/1-0-',3,[{file,"src/prometheus_registry.erl"},{line,78}]},{prometheus_registry,collect,2,[{file,"src/prometheus_registry.erl"},{line,78}]},{prometheus_text_format,format,1,[{file,"src/formats/prometheus_text_format.erl"},{line,74}]}]},{req,[{socket,#Port<0.26180>},{transport,ranch_tcp},{connection,keepalive},{pid,<0.620.0>},{method,<<"GET">>},{version,'HTTP/1.1'},{peer,{{172,17,0,1},51666
}},{host,<<"localhost">>},{host_info,undefined},{port,15672},{path,<<"/api/metrics">>},{path_info,undefined},{qs,<<>>},{qs_vals,[]},{bindings,[]},{headers,[{<<"host">>,<<"localhost:15672">>},{<<"connection">>,<<"keep-alive">>},{<<"cache-control">>,<<"max-age=0">>},{<<"upgrade-insecure-requests">>,<<"1">>},{<<"user-agent">>,<<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36">>},{<<"accept">>,<<"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8">>},{<<"dnt">>,<<"1">>},{<<"accept-encoding">>,<<"gzip, deflate, sdch, br">>},{<<"accept-
language">>,<<"en-DE,de-DE;q=0.8,de;q=0.6,en-US;q=0.4,en;q=0.2">>},{<<"cookie">>,<<"grafana_sess=XXXX; m=XXXX">>}]},{p_headers,[{<<"connection">>,[<<"keep-alive">>]}]},{cookies,undefined},{meta,[]},{body_state,waiting},{buffer,<<>>},{multipart,undefined},{resp_compress,true},{resp_state,waiting},{resp_headers,[]},{resp_body,<<>>},{onresponse,#Fun<rabbit_cowboy_middleware.onresponse.4>}]},{state,{default}}]

Would it make sense to track these errors in another issue? I fear this thread is rather complicated to follow...

deadtrickster commented 7 years ago

I thought this error was fixed there: https://github.com/deadtrickster/prometheus_rabbitmq_exporter/issues/22 :-(.

Have you tried the config?

deadtrickster commented 7 years ago

Ok it must be something with my? docker images. I tested git version (make run-broker) and I tested their 3.6.9-1 deb package. It works there

deadtrickster commented 7 years ago

looks like I finally fixed docker images. Also I started to integrate metrics from /api/overview.

deadtrickster commented 7 years ago

Hi, I added some /api/overview metrics. I attached latest plugin build. Don't forget to rename zip to ez.

prometheus_rabbitmq_exporter-v3.6.9.1.zip

kfreem02 commented 6 years ago

Using rabbitmq 3.7.4, erlang 20.3 on Windows I am unable to disable queue metrics. I have tried using the config from your comment posted Mar 28 2017, and have also added queue_messages_stat, [] but all queue metrics are still exported. Note that the config below does disable exchange metrics.

[
 {rabbit, [
   {queue_master_locator, "min-masters"}
 ]},
 {prometheus, [
   {rabbitmq_exporter, [
     {path, "/mymetrics"},
     {connections_total_enabled, false},
     {queue_messages_stat, []},
     {exchange_messages_stat, []}
   ]},
  {collectors,
    [
     %% Standard prometheus collectors
     prometheus_vm_statistics_collector,
     prometheus_vm_system_info_collector,
     prometheus_vm_memory_collector,
     prometheus_mnesia_collector,

     %% Process Info Collector
%%     prometheus_process_collector,

     %% Rabbitmq collectors
%%     prometheus_rabbitmq_overview_collector,
%%     prometheus_rabbitmq_exchanges_collector,
     prometheus_rabbitmq_mnesia_tables_collector
%%     prometheus_rabbitmq_nodes_collector
    ]}
 ]},
 {rabbitmq_management, [
   {listener, [
     {port, 15672},
     {ssl, false}
   ]}
 ]}
].
deadtrickster commented 6 years ago

right, because messages_stat controls messages stat. and you also have exchange_collector disabled. you can try the same with queue_collector.

kfreem02 commented 6 years ago

Is there another way to disable the queue_collector? I have tried with the updated config below, limiting queue metrics to messages_published_total, and I still see the full complement of queue metrics returned.

[
 {prometheus, [
   {rabbitmq_exporter, [
     {path, "/mymetrics"},
     {connections_total_enabled, false},
     {queue_messages_stat, [messages_published_total]},
     {exchange_messages_stat, []}
   ]},
  {collectors,
    [
%% Standard prometheus collectors
     prometheus_vm_statistics_collector,
     prometheus_vm_system_info_collector,
     prometheus_vm_memory_collector,
     prometheus_mnesia_collector,

%% Rabbitmq collectors
     prometheus_rabbitmq_mnesia_tables_collector
    ]}
 ]},
 {rabbitmq_management, [
   {listener, [
     {port, 15672},
     {ssl, false}
   ]}
 ]}
].

With the above config, queue metrics returned include: rabbitmq_queue_durable rabbitmq_queue_auto_delete rabbitmq_queue_exclusive rabbitmq_queue_messages_ready rabbitmq_queue_messages_unacknowledged rabbitmq_queue_messages rabbitmq_queue_messages_ready_ram rabbitmq_queue_messages_unacknowledged_ram rabbitmq_queue_messages_ram rabbitmq_queue_messages_persistent rabbitmq_queue_message_bytes rabbitmq_queue_message_bytes_ready rabbitmq_queue_message_bytes_unacknowledged rabbitmq_queue_message_bytes_ram rabbitmq_queue_message_bytes_persistent rabbitmq_queue_head_message_timestamp rabbitmq_queue_disk_reads rabbitmq_queue_disk_writes rabbitmq_queue_disk_size_bytes rabbitmq_queue_consumers rabbitmq_queue_consumer_utilisation rabbitmq_queue_memory rabbitmq_queue_state rabbitmq_queue_messages_published_total

deadtrickster commented 6 years ago

ok, I'll look at this

deadtrickster commented 6 years ago

ok, found, looks like not-so-great-hack https://github.com/deadtrickster/prometheus_rabbitmq_exporter/blob/master/src/prometheus_rabbitmq_exporter.erl#L9

proper fix is coming

deadtrickster commented 6 years ago

@kfreem02 please try the latest release

kfreem02 commented 6 years ago

Confirm fixed

deadtrickster commented 6 years ago

Great, thanks