Open rodecker opened 4 years ago
Icinga, another nagios fork, or something else entirely?
Prometheus with Alertmanager :)
Telegraf + VictoriaMetrics was really nice to set up. Either send Influx to Victoria or let Victoria fetch prometheus from Telegraf.
I also added MTR support to my Telegraf-fork which made it easy to get nice stats in grafana how hops are evolving over time. This could be useful for the Ring especially.
Let me know if it's of interest.
For monitoring (vs. telemetry), Prometheus, node_exporter and Alertmanager is hard to beat.
I tried node_exporter first, but the 'everything shall be run on a different port' theme did not sit well with me.
So how it works is that Telegraf, which btw has excellent support out of the box for most things and has support for executing custom binaries that exports different formats (influx, json, simple etc), exports data via a output plugin that exports in prometheus format. VictoriaMetrics pulls the data. You can still run Alertmanager as you would, or use their own https://docs.victoriametrics.com/vmalert.html.
At the same time you get the same features as Thanos with storage over time etc.
I did have a look and there's a fairly new victoriametrics available straight in the repo. I would need to compile a telegraf from my own fork if there should be MTR support however. I also made a bit better TLS client certificate support, which means you could use client certificates between all nodes for transporting data.
So in short node_exporter + Alertmanager is technically the same as telegraf + victoriametrics.
If people like running prometheus, maybe this is interesting? https://opensourcelibs.com/lib/network_exporter
Some kind of monitoring system that sends mails when ring infrastructure servers or services are down. Monitoring of hosts and services should be automatically configured when they are added to ansible.