How to limit exposed metrics

dry4ng commented 9 months ago

I'm able to limit what is scraped via servicemonitor, but is there a way to limit what is exposed at the /metrics endpoint? Our current setup returns 26k metrics.

danihodovic commented 9 months ago

Can you post a log of what you're seeing?

dry4ng commented 9 months ago

Here is a typical result from the exporter. As you can see most host/task/queue combination generate 770 metrics. While celery_task_runtime_bucket returns 11205 metrics. And the total number of rows is 25956. In my podmonitoring config (servicemonitor) I drop most metrics except few. I would like to have the dropped at the exporter level.

``` # HELP celery_task_sent_total Sent when a task message is published. # TYPE celery_task_sent_total counter # 772 celery_task_sent_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 0.0 # HELP celery_task_sent_created Sent when a task message is published. # TYPE celery_task_sent_created gauge # 772 celery_task_sent_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297088633437e+09 # HELP celery_task_received_total Sent when the worker receives a task. # TYPE celery_task_received_total counter # 772 celery_task_received_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 330.0 # HELP celery_task_received_created Sent when the worker receives a task. # TYPE celery_task_received_created gauge # 772 celery_task_received_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.69562970886338e+09 # HELP celery_task_started_total Sent just before the worker executes the task. # TYPE celery_task_started_total counter # 772 celery_task_started_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 16.0 # HELP celery_task_started_created Sent just before the worker executes the task. # TYPE celery_task_started_created gauge # 772 celery_task_started_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297088634067e+09 # HELP celery_task_succeeded_total Sent if the task executed successfully. # TYPE celery_task_succeeded_total counter # 772 celery_task_succeeded_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 326.0 # HELP celery_task_succeeded_created Sent if the task executed successfully. # TYPE celery_task_succeeded_created gauge # 772 celery_task_succeeded_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.695629708863434e+09 # HELP celery_task_failed_total Sent if the execution of the task failed. # TYPE celery_task_failed_total counter # 815 celery_task_failed_total{exception="",hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 0.0 # HELP celery_task_failed_created Sent if the execution of the task failed. # TYPE celery_task_failed_created gauge # 815 celery_task_failed_created{exception="",hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297088634605e+09 # HELP celery_task_rejected_total The task was rejected by the worker, possibly to be re-queued or moved to a dead letter queue. # TYPE celery_task_rejected_total counter # 772 celery_task_rejected_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 0.0 # HELP celery_task_rejected_created The task was rejected by the worker, possibly to be re-queued or moved to a dead letter queue. # TYPE celery_task_rejected_created gauge # 772 celery_task_rejected_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297088634827e+09 # HELP celery_task_revoked_total Sent if the task has been revoked. # TYPE celery_task_revoked_total counter # 772 celery_task_revoked_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 0.0 # HELP celery_task_revoked_created Sent if the task has been revoked. # TYPE celery_task_revoked_created gauge # 772 celery_task_revoked_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297088635025e+09 # HELP celery_task_retried_total Sent if the task failed, but will be retried in the future. # TYPE celery_task_retried_total counter # 772 celery_task_retried_total{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 0.0 # HELP celery_task_retried_created Sent if the task failed, but will be retried in the future. # TYPE celery_task_retried_created gauge # 772 celery_task_retried_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.695629708863521e+09 # HELP celery_worker_up Indicates if a worker has recently sent a heartbeat. # TYPE celery_worker_up gauge # 27 celery_worker_up{hostname="myhostname"} 1.0 # HELP celery_worker_tasks_active The number of tasks the worker is currently processing # TYPE celery_worker_tasks_active gauge # 27 celery_worker_tasks_active{hostname="myhostname"} 6.0 # HELP celery_task_runtime Histogram of task runtime measurements. # TYPE celery_task_runtime histogram # 11205 celery_task_runtime_bucket{hostname="myhostname",le="0.005",name="mytaskname",queue_name="myqueuename"} 287.0 # 747 celery_task_runtime_count{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 607.0 # 747 celery_task_runtime_sum{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1034.6516969194636 # HELP celery_task_runtime_created Histogram of task runtime measurements. # TYPE celery_task_runtime_created gauge # 749 celery_task_runtime_created{hostname="myhostname",name="mytaskname",queue_name="myqueuename"} 1.6956297089354677e+09 # HELP celery_queue_length The number of message in broker queue. # TYPE celery_queue_length gauge # 4 celery_queue_length{queue_name="myqueuename"} 0.0 # HELP celery_active_consumer_count The number of active consumer in broker queue. # TYPE celery_active_consumer_count gauge # HELP celery_active_worker_count The number of active workers in broker queue. # TYPE celery_active_worker_count gauge # 2 celery_active_worker_count{queue_name="myqueuename"} 25.0 # HELP celery_active_process_count The number of active processes in broker queue. # TYPE celery_active_process_count gauge # 2 celery_active_process_count{queue_name="myqueuename"} 0.0 ```

adinhodovic commented 9 months ago

Why don't u adjust the buckets? Less buckets less metrics. Obv a drawback shortterm by adjusting bucket sizes.

export CE_BUCKETS=1,10,60,600,1800

adinhodovic commented 9 months ago

U can also set CE_GENERIC_HOSTNAME_TASK_SENT_METRIC=true to set a generic hostname for the celery_task_sent_total metric. Otherwise it's the host that sent the task, e.g a random django pod which generates label cardinality.

dry4ng commented 8 months ago

Thanks, that worked.

danihodovic commented 8 months ago

Good guy Adin

danihodovic / celery-exporter

How to limit exposed metrics #268