dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.55k stars 712 forks source link

Add Prometheus metrics for cumulative task transition counts on workers #8697

Open hendrikmakait opened 1 week ago

hendrikmakait commented 1 week ago

Right now, we have the current count of tasks in the various state on the workers exposed as dask_worker_tasks. In some scenarios, we're more interested in the total count and its rate of change, so we should add another metric that tracks cumulative counts.

fjetter commented 1 week ago

I'm not sure I agree with this. Why can't we track the rate of change of the existing metric? What additional value would this extra cumulative counter add? I am concerned that this is too much noise. Historically, I found the worker level metrics not to be very valuable

hendrikmakait commented 1 week ago

The existing metric is a point query, so it doesn't provide an accurate picture of the change that happens on the worker.