chaos-jetzt / chaos-jetzt-nixfiles

5 stars 2 forks source link

services/monitoring: Rework monitoring concept for better resiliency and efficency #29

Closed e1mo closed 2 months ago

e1mo commented 1 year ago

Introducing a single new monitoring host to reduce the pain of having multiple prometheus and alertmanagers.

This means there won't be duplicate alerts in case a server goes down or anything like that, thus hopefully making alerts stand out a bit better.

We now also don't just monitor synapse and pretix individually but also do blackbox checks to all of our services.

The public grafana makes it easier to actually take a look at the metrics and also offers a way for users to see if there are any problems going on.