NagiosEnterprises / nagioscore

Nagios Core
GNU General Public License v2.0
1.58k stars 451 forks source link

Service notifications are being sent even when host is in downtime #752

Open kbbl1977 opened 4 years ago

kbbl1977 commented 4 years ago

Hello there,

we're having some issues with false notifications being sent, even though the server is in downtime period.

We've discovered these false notifs are sent right after Nagios reload/restart. See piece of log attached.

nag.txt

It simply looks like Nagios process the retention file, where it finds that some services are in alarm state. However there is downtime set for the appropriate host, but Nagios process this info a bit later, after notifications for those alarms have already been sent.

To be true, the notifications are not sent after each Nagios reload. I suppose this happen only around the time, when notification interval is due.

So far we've encountered this in scenario, where alarm is on service and downtime is on corresponding host. Probably this can be avoided by setting downtime for all services on host as well, but we usually forgot to do so;-) This far i can not confirm that such workaround would guarantee desired result.

System Info:

I can provide more info if needed.

Many thanks Tomáš Novosad

kbbl1977 commented 4 years ago

Hello again, forgot to mention, that this issue is persisting across a long time. It is not a matter of some recent update.

It's just that it does not bother much usually, but we have some more server in downtime these days so this issue comes to mind more often.

All the best, Tomáš

sawolf commented 4 years ago

Hi @kbbl1977, thanks for reporting this. I'll try to look into it soon.

stefan927 commented 4 years ago

hey, i'm facing the same behaviour with:

CentOS7.4 - 3.10.0-693.21.1.el7.x86_64 48 Core HW / 256GB RAM Nagios Core: nagios 4.4.6 < compiled from source mk-livestatus broker module: 1.2.8p25 our retention.dat is ~57M number of services/hosts/... reported by nagios syntax check Checked 35709 services. Checked 2617 hosts. Checked 1171 host groups. Checked 6396 service groups. Checked 794 contacts. Checked 2 contact groups. Checked 30 commands. Checked 5 time periods. Checked 0 host escalations. Checked 0 service escalations. Checked 2617 hosts Checked 0 service dependencies Checked 0 host dependencies Checked 5 timeperiods

i'm also happy to provide more info if needed.

did you maybe get a chance, to get to the bottom of that?

best, stefan