ITRS-Group / monitor-merlin

Module for Effortless Redundancy and Loadbalancing In Naemon
https://itrs-group.github.io/monitor-merlin/
GNU General Public License v2.0
22 stars 14 forks source link

Naemon Segmentation Fault when running missing notification command #110

Closed eschoeller closed 3 years ago

eschoeller commented 3 years ago

Naemon Core 1.2.4 Merlin daemon 2021.3.1

While merlin is running and the NEB module is loaded I immediately get a segmentation fault from naemon when a notification command is run. I turn off merlind, but leave the merlin module loaded, and I see these errors instead:

Warning: Notification command for contact 'XXX_email' about service 'XYZABC' exited with exit code 127. stdout: '(empty)', stderr: '/usr/lib/naemon/plugins/eventhandlers/notify-by-email: 12: /usr/lib/naemon/plugins/eventhandlers/notify-by-email: [[: not found /usr/lib/naemon/plugins/eventhandlers/notify-by-email: 16: /usr/lib/naemon/plugins/eventhandlers/notify-by-email: sendmail: not found

Clearly I have some issues to sort out in my configuration, but the fact that the notification command has an error shouldn't yield a segmentation fault, afaic.

jacobbaungard commented 3 years ago

Naemon 1.2.4 had some API changes that were only incorporated in 2021.4.1. Could you try updating and see if it's any better?

If it still crashes, a stacktrace from gdb would be very useful.

eschoeller commented 3 years ago

Terribly sorry about this "false alarm" ... but it appears I was loading the merlin NEB module twice somehow. Naemon spit out this warning about it:

May 27 17:24:57 nagios-host naemon: qh: Handler 'merlin' registered more than once

And it would run OK up until the point where a notification would need to be generated, and then naemon would crash. I'm not sure if this is terribly useful to know or not - again sorry this is really just a dumb mistake on my end. the module was being loaded in two different config files (naemon.cfg and conf.d/merlin.cfg)

jacobbaungard commented 3 years ago

Good to hear it wasn't more serious, although of course it's not nice when we get a crash like this. Perhaps Naemon ought to exit if a handler is registered more than once as above.

I don't think we'll have resources from this end to fix this specific issue, so I'll close the issue for now. However PRs are always welcome, so if you find a good solution to this problem (either here or in https://github.com/naemon/naemon-core), we'll be happy to look at it.