In both cases I ran into situations where naemon would crash (and dump a core) and merlind would eventually peg itself at 99% CPU usage. I went in circles for awhile trying to determine what was going on. I started to narrow in on service and host checks that would return a CRITICAL state and cause naemon to crash when it was attempting to generate a notification (even though I had notifications disabled globally). During my initial load testing I was using mostly ping checks that all returned OK, so I rarely hit this condition. But the moment I started getting checks that returned CRITICAL, things would break.
Anyway, long story short - I built merlin from source and everything is fine now. But given the run-around I went through, I figured I'd report this here for anyone else who may encounter this problem -or- merely as a suggestion that it might be an appropriate time to package a new release.
I did see in the github issues (#146 ) there was a 2022.06.30 release, but I never actually found it.
I can re-configure these systems to trigger the issue again pretty easily if you need more info, but since the issue is fixed in the current source code I doubt any further troubleshooting is needed.
Some additional information about the systems where I encountered these problems:
CentOS Stream release 8
4.18.0-527.el8.x86_64 #1 SMP Thu Nov 23 14:16:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
libnaemon-1.4.1-18.1.x86_64
naemon-thruk-1.4.1-13.1.noarch
naemon-livestatus-1.4.1-14.1.x86_64
naemon-1.4.1-13.1.noarch
naemon-devel-1.4.1-18.1.x86_64
naemon-core-1.4.1-18.1.x86_64
naemon-vimvault-1.4.0-3.2.x86_64
It also seems like I may have had this exact same problem on a set of Debian machines on 3/6/2023 after naemon got upgraded there. I suppose the fix slipped my mind!
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux
ii libnaemon:amd64 1.4.1-1 amd64
ii naemon 1.4.1-1 amd64
ii naemon-core 1.4.1-1 amd64
ii naemon-dev 1.4.1-1 amd64
ii naemon-livestatus 1.4.1-1 amd64
ii naemon-thruk 1.4.1-1 amd64
ii naemon-vimvault 1.4.0-1 amd64
ii thruk 3.10-1 amd64
There is a release available here on github: 2022.06.02. I started off using that. I then migrated to using packages hosted on this mirror: https://download.opensuse.org/repositories/home:/itrs-op5/CentOS_8_Stream/
In both cases I ran into situations where naemon would crash (and dump a core) and merlind would eventually peg itself at 99% CPU usage. I went in circles for awhile trying to determine what was going on. I started to narrow in on service and host checks that would return a CRITICAL state and cause naemon to crash when it was attempting to generate a notification (even though I had notifications disabled globally). During my initial load testing I was using mostly ping checks that all returned OK, so I rarely hit this condition. But the moment I started getting checks that returned CRITICAL, things would break.
Anyway, long story short - I built merlin from source and everything is fine now. But given the run-around I went through, I figured I'd report this here for anyone else who may encounter this problem -or- merely as a suggestion that it might be an appropriate time to package a new release.
I did see in the github issues (#146 ) there was a 2022.06.30 release, but I never actually found it.
I can re-configure these systems to trigger the issue again pretty easily if you need more info, but since the issue is fixed in the current source code I doubt any further troubleshooting is needed.
Some additional information about the systems where I encountered these problems: CentOS Stream release 8 4.18.0-527.el8.x86_64 #1 SMP Thu Nov 23 14:16:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux libnaemon-1.4.1-18.1.x86_64 naemon-thruk-1.4.1-13.1.noarch naemon-livestatus-1.4.1-14.1.x86_64 naemon-1.4.1-13.1.noarch naemon-devel-1.4.1-18.1.x86_64 naemon-core-1.4.1-18.1.x86_64 naemon-vimvault-1.4.0-3.2.x86_64
NAME="Red Hat Enterprise Linux" VERSION="8.9 (Ootpa)" 4.18.0-513.5.1.el8_9.x86_64 #1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux naemon-livestatus-1.4.1-14.1.x86_64 naemon-1.4.1-13.1.noarch libnaemon-1.4.1-18.1.x86_64 naemon-vimvault-1.4.0-3.2.x86_64 naemon-core-1.4.1-18.1.x86_64 naemon-thruk-1.4.1-13.1.noarch
(Both CentOS and RHEL systems were originally fetching naemon from https://labs.consol.de/repo/stable/rhel8/x86_64/ but switched to https://download.opensuse.org/repositories/home:/naemon/CentOS_7/)
It also seems like I may have had this exact same problem on a set of Debian machines on 3/6/2023 after naemon got upgraded there. I suppose the fix slipped my mind! PRETTY_NAME="Debian GNU/Linux 10 (buster)" 4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux ii libnaemon:amd64 1.4.1-1 amd64
ii naemon 1.4.1-1 amd64 ii naemon-core 1.4.1-1 amd64 ii naemon-dev 1.4.1-1 amd64 ii naemon-livestatus 1.4.1-1 amd64 ii naemon-thruk 1.4.1-1 amd64 ii naemon-vimvault 1.4.0-1 amd64 ii thruk 3.10-1 amd64