deadtrickster / prometheus_rabbitmq_exporter

Prometheus.io exporter as a RabbitMQ Managment Plugin plugin
MIT License
289 stars 71 forks source link

incompatibility between prometheus_rabbitmq_exporter and rabbitmq_sharding? #16

Open oneiros-de opened 7 years ago

oneiros-de commented 7 years ago

We have a rabbitmq 3.6.6 with these plugins:

[e*] amqp_client                       3.6.6
[  ] cowboy                            1.0.3
[  ] cowlib                            1.0.1
[e*] mochiweb                          2.13.1
[E*] prometheus                        3.0.1
[E*] prometheus_process_collector      1.0.0
[E*] prometheus_rabbitmq_exporter      v3.6.5.6
[  ] rabbitmq_amqp1_0                  3.6.6
[  ] rabbitmq_auth_backend_ldap        3.6.6
[  ] rabbitmq_auth_mechanism_ssl       3.6.6
[  ] rabbitmq_consistent_hash_exchange 3.6.6
[  ] rabbitmq_event_exchange           3.6.6
[  ] rabbitmq_federation               3.6.6
[  ] rabbitmq_federation_management    3.6.6
[  ] rabbitmq_jms_topic_exchange       3.6.6
[E*] rabbitmq_management               3.6.6
[e*] rabbitmq_management_agent         3.6.6
[  ] rabbitmq_management_visualiser    3.6.6
[  ] rabbitmq_mqtt                     3.6.6
[  ] rabbitmq_recent_history_exchange  1.2.1
[  ] rabbitmq_sharding                 0.1.0
[  ] rabbitmq_shovel                   3.6.6
[  ] rabbitmq_shovel_management        3.6.6
[  ] rabbitmq_stomp                    3.6.6
[  ] rabbitmq_top                      3.6.6
[  ] rabbitmq_tracing                  3.6.6
[  ] rabbitmq_trust_store              3.6.6
[e*] rabbitmq_web_dispatch             3.6.6
[  ] rabbitmq_web_stomp                3.6.6
[  ] rabbitmq_web_stomp_examples       3.6.6
[  ] sockjs                            0.3.4
[e*] webmachine                        1.10.3

A colleague experimented with the rabbitmq_sharding plugin: He enable it, disabled it and restarted the server. The server then refused to start with this error:

Error description:
   {could_not_start,rabbitmq_management,
       {badarg,
           [{ets,insert,
                [prometheus_registry_table,{default,prometheus_summary}],
                []},
            {prometheus_registry,register_collector,2,
                [{file,"src/prometheus_registry.erl"},{line,79}]},
            {prometheus_metric,insert_mf,3,
                [{file,"src/prometheus_metric.erl"},{line,101}]},
            {prometheus_rabbitmq_exporter,dispatcher,0,
                [{file,"src/prometheus_rabbitmq_exporter.erl"},{line,12}]},
            {rabbit_mgmt_dispatcher,'-build_dispatcher/1-lc$^0/1-1-',1,
                [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,27}]},
            {rabbit_mgmt_dispatcher,'-build_dispatcher/1-lc$^0/1-1-',1,
                [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,27}]},
            {rabbit_mgmt_dispatcher,build_dispatcher,1,
                [{file,"src/rabbit_mgmt_dispatcher.erl"},{line,27}]},
            {rabbit_mgmt_app,make_loop,1,
                [{file,"src/rabbit_mgmt_app.erl"},{line,57}]}]}}

We somehow "fixed" this by disabling the rabbitmq_management plugin (which also disabled the prometheus_rabbitmq_exporter plugin). Then the server at least started again, but it refused to start with prometheus_rabbitmq_exporter. We then successfully enabled first the sharding then the prometheus_rabbitmq_exporter; the server survived a restart. We also could disable rabbitmq_sharding and restart, so we are now at the step before the experiments with sharding.

Do you have idea what confused the exporter plugin?

Status of node 'rabbit@stage-icms-rabbitmq1' ...
[{pid,31654},
 {running_applications,
     [{prometheus_rabbitmq_exporter,
          "RabbitMQ Prometheus.io metrics exporter","v3.6.5.6"},
      {rabbitmq_management,"RabbitMQ Management Console","3.6.6"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.6"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.6"},
      {rabbit,"RabbitMQ","3.6.6"},
      {os_mon,"CPO  CXC 138 46","2.4.1"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.6"},
      {prometheus_process_collector,
          "Prometheus.io process collector\n    Collector exports the current state of process metrics including cpu, memory,\n    file descriptor usage and native threads count as well as the process start and up times.",
          "1.0.0"},
      {prometheus,"Prometheus.io client in Erlang","3.0.1"},
      {mnesia,"MNESIA  CXC 138 12","4.14.2"},
      {rabbit_common,[],"3.6.6"},
      {ssl,"Erlang/OTP SSL application","8.1"},
      {public_key,"Public key infrastructure","1.3"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {inets,"INETS  CXC 138 49","6.3.4"},
      {crypto,"CRYPTO","3.7.2"},
      {syntax_tools,"Syntax tools","2.1.1"},
      {asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},
      {compiler,"ERTS  CXC 138 10","7.0.3"},
      {xmerl,"XML parser","1.3.12"},
      {sasl,"SASL  CXC 138 11","3.0.2"},
      {stdlib,"ERTS  CXC 138 10","3.2"},
      {kernel,"ERTS  CXC 138 10","5.1.1"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 19 [erts-8.2] [source] [64-bit] [smp:16:16] [async-threads:256] [kernel-poll:true]\n"},
 {memory,
     [{total,89240368},
      {connection_readers,459288},
      {connection_writers,238032},
      {connection_channels,969744},
      {connection_other,3020744},
      {queue_procs,1175072},
      {queue_slave_procs,0},
      {plugins,3405216},
      {other_proc,16810208},
      {mnesia,375952},
      {mgmt_db,5084944},
      {msg_index,77456},
      {other_ets,1871768},
      {binary,11048176},
      {code,25202610},
      {atom,1041593},
      {other_system,18459565}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.8},
 {vm_memory_limit,6714405683},
 {disk_free_limit,50000000},
 {disk_free,48178319360},
 {file_descriptors,
     [{total_limit,924},
      {total_used,15},
      {sockets_limit,829},
      {sockets_used,13}]},
 {processes,[{limit,1048576},{used,1216}]},
 {run_queue,0},
 {uptime,814},
 {kernel,{net_ticktime,60}}]
deadtrickster commented 7 years ago

Hi

{badarg,
           [{ets,insert,
                [prometheus_registry_table,{default,prometheus_summary}],
                []},

this error usually means ETS table (prometheus_registry_table) doesn't exist. This table created and owned by prometheus_sup supervisor. So either it didn't start or crashes (in both cases there could be crash logs somewhere).

I never used sharding plugin so I need to do research here...

dcorbacho commented 5 years ago

@oneiros-de The problem you found is most likely caused by the handling of dependencies, the build system has been fixed in https://github.com/deadtrickster/prometheus_rabbitmq_exporter/pull/54. I cannot reproduce it with the current master branch and the latest RabbitMQ 3.7.8-rc4