elixir-ecto / db_connection

Database connection behaviour
http://hexdocs.pm/db_connection/DBConnection.html
309 stars 112 forks source link

add metrics the pool #186

Closed karlseguin closed 5 years ago

karlseguin commented 5 years ago

I wanted to add some basic metrics to the pool to help monitor its health. I didn't just want an average wait time, so I added 3 arbitrary buckets. It didn't seem worthwhile to add a histogram library.

I really struggled to write tests for this. So much so that I gave up and figured I'd send a PR first to at least get feedback. No point struggling with tests if this isn't something people want.

If this PR is generally acceptable, I would like some advice on how to test. I thought of copying the queue_tests but that's very non deterministic. I was thinking of more narrow unit tests with explicit checkouts and checkins, but no test does that, so I figured that wasn't the right way.

fishcakez commented 5 years ago

The metrics here can be calculated by the pool_time value in the log entry. I don't like that we create our own metrics system or bring in metrics library, so would no histograms because it makes metric aggregation more difficult than it already is. I would prefer to only expose metrics that we can't get already, such as queue size, head of queue wait time, connected pool size, pool size current, pool status that don't require changing state on enqueue or dequeue.

seancribbs commented 5 years ago

I would really prefer to see this use the telemetry library for reporting metrics, then the author of the app using the library can decide how they aggregate or publish metrics on their own.

fishcakez commented 5 years ago

I would really prefer to see this use the telemetry library for reporting metrics, then the author of the app using the library can decide how they aggregate or publish metrics on their own.

Telemetry is good at this but if we only have "gauge" metrics as a GenServer call a user can use the telemetry polling mechanism to decide intervals and create their own events/aggregation. Then there is less logic in the pool itself, and we dont bring in any dependencies.

Edit: And/or we can have ecto call the pool asking for the metrics, and it will call telemetry directly.