HenriWahl / Nagstamon

Nagios status monitor for your desktop.
https://nagstamon.de
Other
423 stars 177 forks source link

Prometheus/AlertManager group_by alerts #746

Open RamyAllam opened 3 years ago

RamyAllam commented 3 years ago

In Nagios/Opsview we used to have a single alert when the instance is down and not of the whole list of the services we monitor on that instance.

Can the same be added to Prometheus/AlertManager to group the alerts by the instance name for example or display only a single alert when the instance is down?

I have the following added to /etc/alertmanager/alertmanager.yml

route:
  group_by: ['instance_pretty_name']

And the targets as: https://pastebin.com/4Bbvsgby , these are all the services to monitor for a single instance.

Results: I can see the alerts are properly grouped in AlertManager Web UI, but not in Nagstmon.

Nagstmon

INagsmon

AlertManager Web UI

AlertManager Web UI

RamyAllam commented 3 years ago

As a workaround, I used inhibit_rules for this as follows and then passed suppressed to Regular expression for attempt in the Filters tab. It works properly but it would be awesome if it's the default behavior in AlertManager module.

inhibit_rules:
 - source_match:
     alertname: 'InstanceDown'
     job: 'node'
   target_match:
     severity: 'critical'
   equal: ['instance_pretty_name']