fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
257 stars 79 forks source link

Fluentd metrics are not sent to prometheus for workers other than 0 #194

Open prashant407 opened 2 years ago

prashant407 commented 2 years ago

Hi

We are using Multiple workers in our fluentd configuration. Please refer the below configuration

 <system>
   workers 3
 </system>
 <source>
   @type forward
   port 24224
 </source>
 <match test.*>
   @type elasticsearch
   host elasticsearch
   port 9200
 </match>
 <source>
  @type prometheus
 </source>
 <source>
   @type prometheus_output_monitor
   interval 10
   <labels>
    host ${hostname}
   </labels>
 </source>  

For scraping the metrics we are using prometheus. But when we are checking for the metrics it is only showing for the worker_id='0'. Please refer to the below output.

[nodetest]# curl localhost:24231/metrics | grep fluentd_output_status_emit_records
100  4448  100  4448    0    # TYPE fluentd_output_status_emit_records gauge
# HELP fluentd_output_status_emit_records Current emit records.
fluentd_output_status_emit_records{host="test-fluentd-daemonset-d26ts",worker_id="0",plugin_id="object:13e86d0",type="elasticsearch"} 0.0

Could someone please explain why for other workers prometheus is not able to scrape the metrics?

prashant407 commented 2 years ago

I also tried to run http://localhost:24231/aggregated_metrics command for fetching the metrics from all workers as suggested When using multiple workers, each worker binds to port + fluent_worker_id. To scrape metrics from all workers at once, you can access http://localhost:24231/aggregated_metrics on the below link

https://github.com/fluent/fluent-plugin-prometheus.

But it is not giving the expected output. Could someone please help in fetching the metrics from all workers?

Thanks

prashant407 commented 2 years ago

any updates?

Mohitj252 commented 2 years ago

I am also getting the same error, can anyone guide me how we can fix it.

khrizar commented 1 year ago

I am having a similiar issue. I have 7 workers, the individual /metrics works fine. But, /aggretaed_metrics responded on the corresponding port but it gives me a timeout. This is the configuration :

<system>
  workers 7
</system>

<source>
  @type forward
</source>

<source>
  @type monitor_agent
  bind 127.0.0.1
  port 24220
</source>

<source>
  @type prometheus
  bind 127.0.0.1
  port 24240
  aggregated_metrics_path /aggregated_metrics
</source>

<source>
  @type prometheus_monitor
</source>

<source>
  @type prometheus_output_monitor
</source>