NagiosEnterprises / nagioscore

Nagios Core
GNU General Public License v2.0
1.57k stars 450 forks source link

Need to stop the checks while scheduled downtime #708

Open kumarsu7 opened 5 years ago

kumarsu7 commented 5 years ago

Hi All,

Is there any option to stop the checks while the server was in downtime.

We are facing so many issues with that

Regards Kumar

ericloyd commented 5 years ago

I've wondered about this for a long time. If you're in downtime, instead of just not notifying, why not turn off the checks entirely? I get that, from a coding perspective, it's easier to not have to worry about another logical operation to see if you should be doing checks, but still - turn off checks and notifications during downtime would be a welcome improvement for most of our clients. Less work done while things aren't needed makes sense.

sawolf commented 5 years ago

I don't think there's a way to do that via scheduled downtime, though you may be able to achieve something similar using timeperiods (assuming the downtime happens at some regular interval).

@kumarsu7 What kind of issues are you facing? I haven't seen any situations before where check execution would matter, so long as notifications aren't being sent

kumarsu7 commented 5 years ago

I don't think there's a way to do that via scheduled downtime, though you may be able to achieve something similar using timeperiods (assuming the downtime happens at some regular interval). @kumarsu7 What kind of issues are you facing? I haven't seen any situations before where check execution would matter, so long as notifications aren't being sent

We have done integration with servicenow using mid server. Please refer the below link.

https://docs.servicenow.com/bundle/newyork-it-operations-management/page/product/event-management/task/configure-nagios-connector.html

Problme is checks will happen when we place the server scheduled for downtime as well. so it is getting ticketed in Service now.

ericloyd commented 5 years ago

There is no way to use scheduled downtime to stop active checks. Period. One can do funky stuff with APIs in XI, but that doesn't apply here. And still, it's funky stuff. I seriously think an option to disable active checks of a service/host and notifications while the host/service is in downtime would be a great addition.

Especially if your checks are computationally intensive, why bother doing them at all if the device is in a downtime state?

sawolf commented 5 years ago

So, reading over that document, it's not clear how exactly they're retrieving host/service states, but as you (@kumarsu7) indicate, whatever method they use doesn't take downtime into account. I've also spoken to our support technicians about the issue, and they're able to confirm the same problem. As far as we're concerned, this is a problem with servicenow's integration, not with Nagios Core.

We do have a supported method for turning host/service status into servicenow incidents. You can read the relevant documentation here. This method uses the notification logic to create new tickets, and won't have the issues you describe here.

kumarsu7 commented 5 years ago

So, reading over that document, it's not clear how exactly they're retrieving host/service states, but as you (@kumarsu7) indicate, whatever method they use doesn't take downtime into account. I've also spoken to our support technicians about the issue, and they're able to confirm the same problem. As far as we're concerned, this is a problem with servicenow's integration, not with Nagios Core. We do have a supported method for turning host/service status into servicenow incidents. You can read the relevant documentation here. This method uses the notification logic to create new tickets, and won't have the issues you describe here.

We cannot say it's a problem with service now Integration. Once the host has entered the scheduled downtime , until it comes out of it it should not do the checks at all, if it does also Alerts should have some difference compared to normal alerts(not in downtime). We cannot put any event rules in service now also because of it.

It is not easier to change the integration method in production environment @sawolf . Client wont agree for it.

jomann09 commented 5 years ago

You can pass macros into the event handler to a shell script that would tell the script whether or not a host/service is in downtime, you could then have the script run whatever the event handler originally had in place before you changed it to use the wrapper script. You could send the same information but just check to ensure it's not in downtime. That would stop you from having to edit the way service now is integrated and Nagios Core but still have the desired results.