checkly / public-roadmap

Checkly public roadmap. All planned features, updates and tweaks.
https://checklyhq.com
37 stars 7 forks source link

Alerting based on p99 response time #224

Open IanWhitney opened 2 years ago

IanWhitney commented 2 years ago

💡 For general support requests and bug reports, please go to checklyhq.com/support

Is your feature request related to a problem? Please describe. Instead of alerting on a single request being slow, I'd like to alert if the average or p95 time exceeds a limit.

Describe the solution you'd like When setting Assertions, I would like to be able to chose "average response time" or "p95 response time" as the Property value for a Response Time check.

Describe alternatives you've considered Checking the dashboard manually? I'm not sure what other options there are.

Additional context A single slow request may not be worth an alert. But several slow requests will raise our average/p95 response times and are worth investigating.

tnolet commented 2 years ago

@IanWhitney thanks for contributing. This smells a lot like "bucket based" / "budget based" alerting or SLOs. The immediate question I would have is "p99 over how much time?" as it is an aggregate metrics over time. Quickly you will end up with either a sliding window or a fixed error budget per calendar time.

It is similar enough to this ticket that I'm referencing it here https://github.com/checkly/public-roadmap/issues/186