NagiosEnterprises / nagioscore

Nagios Core
GNU General Public License v2.0
1.53k stars 445 forks source link

systemd config has no `Restart` directive #668

Open ianbamforth opened 5 years ago

ianbamforth commented 5 years ago

I've recently been experiencing some failures where nagios dies, dumps its core, and fails to restart. There's no Restart condition in nagioscore/startup/default-service.in

ericloyd commented 5 years ago

I see that this was added to 4.5.0 milestone, but no hint as to whether it will be added (or what's going on with @ianbamforth's core dumps). Any word on whether this will make it or not?

sawolf commented 5 years ago

You know, looking at this again, I don't think I read the text of the issue carefully.

@ianbamforth, if you're getting segfaults, I would consider that to be the more serious issue. Would you be willing to share log info and/or recompile with debugging symbols and share the stack trace?

As far as the way I manage the project, adding the issue to the milestone means the feature/task is "approved". It doesn't necessarily mean that it'll make it into the given version, but it does mean that I think it's a good idea and that I'll try to get it done.

ericloyd commented 5 years ago

Thanks for the insight into your brain, @Madlohe :-)

ianbamforth commented 5 years ago

@sawolf - certainly happy to do that, any chance you could point me in the direction of instructions on how to do that? I agree it's the more important problem, deserves its own ticket if it isn't something daft I've done

sawolf commented 5 years ago

To install with debugging symbols, you'll want to compile as normal (./configure && make all), but instead of a normal install, do make install-unstripped. Other install commands shouldn't be necessary, as long as this system was already in use.

As far as actually getting a core dump, it's a little more involved. Here's the guide I use to set up core dumps on my dev machines. You can skip step 1 if this is a production environment - valgrind will slow the application by ~10x. When all of the settings are changed, do service nagios restart. Once you have the core dump, just send it to me and let me know which version of Core is being used and the distro/arch that it was running on.

sawolf commented 5 years ago

As far as logs to look at, the main one will be /usr/local/nagios/var/nagios.log. You could also possibly turn debugging on in /usr/local/nagios/etc/nagios.cfg, but this won't be helpful until we narrow down the issue further.

ianbamforth commented 4 years ago

Haven't managed to get a core dump yet, but found this in the logs: wproc: iocache_capacity() is -1048576 for worker Core Worker 23531. ...which sounds similar to https://github.com/NagiosEnterprises/nagioscore/issues/386, but I'm on v 4.3.4