cloudprober / cloudprober

An active monitoring software to detect failures before your customers do.
http://cloudprober.org
Apache License 2.0
470 stars 79 forks source link

Adding status code to the latency metric for the HTTP probe #836

Open therealak12 opened 2 weeks ago

therealak12 commented 2 weeks ago

Describe the feature you'd like and the problem it will solve

The latency metric for the http probe currently lacks a status code field. Including such a field would enable measuring success and failure latencies separately.

Implementing this feature with the current custom implementation for the distribution might be complex, but leveraging Prometheus' client_golang library could simplify the process.

manugarg commented 2 weeks ago

Including such a field would enable measuring success and failure latencies separately.

You can already do that right? There is a resp_code metric that HTTP probes export separately.

There might still be value in tracking latency for individual response code, but such use cases are limited in probers as in most cases you want only certain response codes to be counted as success, and when you do that those are the only response codes that will contribute to latency.

therealak12 commented 2 weeks ago

We rely on the latency per status code metric provided by health-exporter. It helps a lot in various troubleshooting scenarios.

We wanted to use cloudprober instead of health-exporter but the unavailability of such metric is a blocker.

therealak12 commented 2 weeks ago

You can already do that right? There is a resp_code metric that HTTP probes export separately.

I think the resp_code metric only provides the request count, not the latency.

therealak12 commented 2 weeks ago

Dear @manugarg, do you agree with a PR implementing this feature or are you against it? By this feature, I mean utilizing the client_go library for distribution metrics and enabling dynamic label values.