Closed karlseguin closed 5 years ago
The metrics here can be calculated by the pool_time
value in the log entry. I don't like that we create our own metrics system or bring in metrics library, so would no histograms because it makes metric aggregation more difficult than it already is. I would prefer to only expose metrics that we can't get already, such as queue size, head of queue wait time, connected pool size, pool size current, pool status that don't require changing state on enqueue or dequeue.
I would really prefer to see this use the telemetry
library for reporting metrics, then the author of the app using the library can decide how they aggregate or publish metrics on their own.
I would really prefer to see this use the
telemetry
library for reporting metrics, then the author of the app using the library can decide how they aggregate or publish metrics on their own.
Telemetry is good at this but if we only have "gauge" metrics as a GenServer call a user can use the telemetry polling mechanism to decide intervals and create their own events/aggregation. Then there is less logic in the pool itself, and we dont bring in any dependencies.
Edit: And/or we can have ecto call the pool asking for the metrics, and it will call telemetry directly.
I wanted to add some basic metrics to the pool to help monitor its health. I didn't just want an average wait time, so I added 3 arbitrary buckets. It didn't seem worthwhile to add a histogram library.
I really struggled to write tests for this. So much so that I gave up and figured I'd send a PR first to at least get feedback. No point struggling with tests if this isn't something people want.
If this PR is generally acceptable, I would like some advice on how to test. I thought of copying the queue_tests but that's very non deterministic. I was thinking of more narrow unit tests with explicit checkouts and checkins, but no test does that, so I figured that wasn't the right way.