Closed sthen closed 4 years ago
Looking at the earlier problem I've tried disabling various features (ending up with only 'checker' enabled) but with no apparent change.
Thanks for the heads up, I had forgotten about this 👍 Unfortunately we cannot delay tomorrow's 2.11 release once again, so we'll have to look into this specific problem in the coming weeks. Up until then, OpenBSD users need to wait (you can link them here).
I'll try my best with @Al2Klimov looking into it, and will coordinate future steps with @lippserd. Not before next week though.
Thanks, do let me know if you run into any problems getting to a state to attempt debugging, happy to help wherever I can. No hurry on my part and good luck with release!
I've got it compiling with https://community.icinga.com/t/building-icinga-2-on-openbsd.
I installed it like this:
git clone https://github.com/Icinga/icinga2.git
cd icinga2
mkdir build
cd build
cmake -DICINGA2_UNITY_BUILD=OFF "-DCMAKE_INSTALL_PREFIX=$(dirname "$(pwd)")/prefix" -DICINGA2_USER=vagrant -DICINGA2_GROUP=vagrant "-DICINGA2_PLUGINDIR=$(dirname "$(pwd)")/prefix/usr/lib/nagios/plugins" ..
make -j2
make -j2 install
... but it seems not to work:
-bash-5.0$ set -x
-bash-5.0$ prefix/sbin/icinga2 daemon -C
+ prefix/sbin/icinga2 daemon -C
prefix/sbin/icinga2[29]: /home/vagrant/icinga2/prefix/lib/icinga2/sbin/icinga2: Cannot allocate memory
-bash-5.0$
@sthen May there be anything openBSD-specific causing this error?
-bash-5.0$ vmstat
procs memory page disks traps cpu
r s avm fre flt re pi po fr sr wd0 cd0 int sys cs us sy id
2 50 26M 1371M 384 0 0 0 0 0 28 0 38 500 98 0 1 98
-bash-5.0$
Error seems to have gone away with a non-Vagrant VM.
... not.
Sorry for the slow reply .. do you need to raise ulimit -d?
huh, no it's not that... will have a poke, this seems weird
@sthen Thank you. Please let me know what steps I need to perform on top of my "I installed it like this: ..." once you've figured this out.
@Al2Klimov OK I'm not sure why the ENOMEM is occurring, but "strip lib/icinga2/sbin/icinga2" allows it to run ...
XXX bleh, scratch that, tested the wrong file :(
By the way: v2.11 uses signals for IPC, is there anything special in OpenBSD on this (which may cause the problem this issue is about)?
Kicking myself for not thinking of it sooner (it builds ok from the ports tree but fails building outside it - despite having the same cmake flags...) but I finally tracked down the ENOMEM - the binary works if you build it with -O2.
OpenBSD and signals - I'm not aware of anything special - I asked and was told "we do all signal functionality except for signal queueing"
Meaning to say, the compiler optimization flags being changed allow Icinga to run like expected? I could imagine that this has something to do with the way Boost context is compiled/linked, we had a similar problem on Windows with optimizations.
https://github.com/Icinga/icinga2/blob/master/CMakeLists.txt#L117
No, sorry that was unclear - this purely relates to the problem @Al2Klimov had when building from git,
+ prefix/sbin/icinga2 daemon -C
prefix/sbin/icinga2[29]: /home/vagrant/icinga2/prefix/lib/icinga2/sbin/icinga2: Cannot allocate memory
@sthen -O2
works, thank you!
CFLAGS=-O2 CXXFLAGS=-O2 cmake -DICINGA2_UNITY_BUILD=OFF "-DCMAKE_INSTALL_PREFIX=$(dirname "$(pwd)")/prefix" -DICINGA2_USER=vagrant -DICINGA2_GROUP=vagrant "-DICINGA2_PLUGINDIR=$(dirname "$(pwd)")/prefix/usr/lib/nagios/plugins" ..
And I seem to be able to reproduce this issue.
@sthen I reproduced it:
The daemon's umbrella process hangs here while waiting for this one to be changed here as that function is registered here.
The signal delivery responsible for that mechanism works on both Linux and OSX, but it seems not to work on OpenBSD. Any idea? May OpenBSD be "too secure" at this point?
Big thanks to @jmatthew for tracking it down to pid not being filled in in siginfo_t (an old OpenBSD missing feature, https://marc.info/?l=openbsd-tech&m=120218016412546&w=2). He is looking at the kernel side but it's complicated to implement fully - can you think of any big problems as far as Icinga is concerned from relaxing the validation to permit either 0 or the expected pid?
Index: lib/cli/daemoncommand.cpp
--- lib/cli/daemoncommand.cpp.orig
+++ lib/cli/daemoncommand.cpp
@@ -317,6 +317,6 @@ static Atomic<bool> l_RequestedReopenLogs (false);
*/
static void UmbrellaSignalHandler(int num, siginfo_t *info, void*)
{
switch (num) {
case SIGUSR1:
// Someone requested to re-open logs
@@ -324,14 +324,14 @@ static void UmbrellaSignalHandler(int num, siginfo_t *
break;
case SIGUSR2:
if (l_CurrentlyStartingUnixWorkerState.load() == UnixWorkerState::Pending
- && info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) {
+ && (info->si_pid == 0 || info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) ) {
// The seemless worker currently being started by StartUnixWorker() successfully loaded its config
l_CurrentlyStartingUnixWorkerState.store(UnixWorkerState::LoadedConfig);
}
break;
case SIGCHLD:
if (l_CurrentlyStartingUnixWorkerState.load() == UnixWorkerState::Pending
- && info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) {
+ && (info->si_pid == 0 || info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) ) {
// The seemless worker currently being started by StartUnixWorker() failed
l_CurrentlyStartingUnixWorkerState.store(UnixWorkerState::Failed);
}
@@ -366,16 +366,16 @@ static void UmbrellaSignalHandler(int num, siginfo_t *
*/
static void WorkerSignalHandler(int num, siginfo_t *info, void*)
{
switch (num) {
case SIGUSR2:
- if (info->si_pid == l_UmbrellaPid) {
+ if (info->si_pid == 0 || info->si_pid == l_UmbrellaPid) {
// The umbrella process allowed us to continue working beyond config validation
l_AllowedToWork.store(true);
}
break;
case SIGINT:
case SIGTERM:
- if (info->si_pid == l_UmbrellaPid) {
+ if (info->si_pid == 0 || info->si_pid == l_UmbrellaPid) {
// The umbrella process requested our termination
Application::RequestShutdown();
}
Hello @sthen and thank you for the great news!
Does that patch actually fix the problem? If yes, please could the author open a PR?
Best, AK
Thanks @Al2Klimov - it does fix the problem, 2.11.2 with that patch has been running successfully for ~2 weeks. I wrote it so have submitted it as PR #7739
Describe the bug
Following the merge of 844e821, startup doesn't complete on OpenBSD. It starts to initialize but doesn't enter normal main processing or respond normally to signals (no response to HUP or a first ^C, a second ^C exits uncleanly). f3fbac2 worked ok.
Discussed a bit in #7320 with dnsmichi ("Might be related to #3517, I remember that OpenBSD treats things differently with threads."), then I was waiting for boost-context parts to get committed and make it into packages and I forgot to open a new ticket, sorry..
OpenBSD environment setup
I can try things and report back but it's likely that it will be easier if someone knowing the code can setup a development environment on OpenBSD - hopefully this is a useful quickstart:
install OpenBSD from snapshots, the 6.5 release didn't have boost context. e.g. https://ftp.fr.openbsd.org/pub/OpenBSD/snapshots/amd64/install66.iso (this is usually very straightforward).
install some packages, you'll want some/all of "pkg_add git bison cmake boost-md gdb mariadb-client monitoring-plugins" (there is a version of gdb in base, but it is old and crappy and won't cope with much written in C++, the version in packages will work better - the binary is installed as "egdb").
unfortunately there are no debug symbols available in pre-built binary packages, if you need them (e.g. for boost) you either need to fetch the ports tree and build from there, "make DEBUG=-g package; pkg_delete boost boost-md; make install" should work for that, or I should be able to build alternative packages and upload them somewhere.
the initial user added during install is setup in class "staff" - default datasize soft limit for this user is 1.5G so you may need to raise that to build - there's no hard datasize limit for "staff" by default. there are maxproc limits, see /etc/login.conf if you need to change them.
I typically build icinga via the ports tree, but there are no major patches so i don't think you should have any problems building yourself direct from icinga2 source tree which is probably more convenient for you. -DICINGA2_PLUGINDIR=/usr/local/libexec/nagios for the monitoring-plugins package.
if you need other tools and can't find the names of packages containing them, "pkg_add pkglocatedb" and "pkglocate bin/whatever" is often a good way to find them.
Expected behavior
icinga initializes and starts main processing.
Screenshots
Your Environment
icinga2 --version
):OpenBSD/amd64 -current, boost 1.66.0
icinga2 feature list
):not relevant I think, but 2.7.1 - doc, monitoring, test
icinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.n/a
Backtraces
3 processes running: