divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
53 stars 15 forks source link

Tokio runtime metrics improvements #3462

Closed divergentdave closed 3 weeks ago

divergentdave commented 3 weeks ago

This makes a number of improvements to metrics mostly centered around Tokio runtime metrics.

The most recent Tokio release stabilized the global queue depth metric. This will likely be a good proxy for overload. Tasks get pushed onto the global queue either when spawned from a non-Tokio thread, or when a Tokio thread's local queue is full.

Here's a sample scrape from a non-tokio_unstable build:

# HELP tokio_queue_depth Number of tasks currently in the runtime's global queue
# TYPE tokio_queue_depth gauge
tokio_queue_depth{queue="global",otel_scope_name="tokio-runtime-metrics"} 0
# HELP tokio_thread_worker_count Number of runtime worker threads
# TYPE tokio_thread_worker_count gauge
tokio_thread_worker_count{otel_scope_name="tokio-runtime-metrics"} 8