concourse / hush-house

Concourse k8s-based environment
https://hush-house.pivotal.io
29 stars 23 forks source link

metrics: set up alerting #26

Closed cirocosta closed 5 years ago

cirocosta commented 5 years ago

Hey,

Despite the fact that we already have metrics being collected by Prometheus, and dashboards being displayed from Grafana, we're still in the need of having alerting set up in order to not need to constantly look at dashboards to know when things go wrong.

From my understanding, we have at least two choices here:

While the second case seems interesting to me from the standpoint that we can decouple the visualization from the alerting, the second case allows us to visualize how our thresholds look like, as well as consume multiple data sources (e.g., not only Prometheus but Stackdriver as well).

Acceptance criteria:

Thanks!

scottietremendous commented 5 years ago

We'll probably want alerts for Datadog SLIs/SLOs

scottietremendous commented 5 years ago

Chore: Bump Grafana to 6.0

YoussB commented 5 years ago

note: the domain doesn't redirect http requests to https

taylorsilva commented 5 years ago

It seems possible to use Prometheus with stackdriver, though the integration has some limitations and is in beta (see here). Based on one medium blog post, seems possible though!

Our current plan is to setup alerts with Grafana and create another long-term issue to implement alerts with Prometheus. We don't want to tackle prometheus now because we have a lot of learning to do about it. We'd rather get hush-house production-ready sooner rather than later, which means using Grafana for alerts in the short-term.

Thoughts? cc @YoussB @cirocosta @scottietremendous

cirocosta commented 5 years ago

sounds goood 😁 thankss!

scottietremendous commented 5 years ago

@taylorsilva - I agree with this. We have other potential avenues to dive into Prometheus and hush-house doesn't need to be held back because of it.

YoussB commented 5 years ago

EOD

Next Steps:

scottietremendous commented 5 years ago

Let's make a new issue around diving into what suite of metrics we can provide on hush house outside of what we currently offer on Wings.

cirocosta commented 5 years ago

Hey,

@pivotal-bin-ju and I were looking at Grafana performing the redirect, and it turns out that Grafana itself doesn't do that (differently from Concourse, which does redirect automatically), meaning that we'd need something else to perform the redirects (like nginx).

We'll leave this to another issue in order to not be blocked on this (something that the other metrics dashboard doesn't provide anyway).

Thanks!