Open bogn83 opened 2 years ago
+1
I believe this metric is working correctly.
I had the same issue as the OP, and suspect it wasn't working as well. But digging deeper, I realized that this metric isn't likely to get scraped by Prometheus unless the system is under sustained load. Since Prometheus only scrapes metrics once in a while (default is 10 seconds), it's possible to get a short spike in load, raise an ActiveRecord::ConnectionTimeoutError
, and have ruby_active_record_connection_pool_waiting
return to 0 when scraped. So it's a race condition. For this metric to reliably report a value above 0, the system must be under load long enough for the threads waiting in the queue to get scraped. After running a load test, I can confirm that it incremented correctly.
We can see how this works in the code. This metric is set by calling the .stat
method on the connection pool:
https://github.com/discourse/prometheus_exporter/blob/8c8f4a58268c7b7ca3b80b4564ca62997c191624/lib/prometheus_exporter/instrumentation/active_record.rb#L60
and here's the .stat
method:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L703-L719
(Referenced AR version is 6.0.5.1
)
we can see that the waiting
metric is set to num_waiting_in_queue
. Tracing through that code, we can see that it's a gauge for the number of threads waiting on a connection, which gets incremented/decremented in this loop:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L203-L226
The above code snippet should convince us that the number of waiting threads is incremented and reported to the waiting
metric, and that it's incremented before the ConnectionTimeoutError
error is raised. This all happens within fractions of a second, so unless there's a sustained load, it might not get scraped by Prometheus.
The query for it always returns 0 even though we can see lots of
even fully unfiltered by labels and in a 30d time window:
while
works.