Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

2.11.0-rc1 startup hangs on OpenBSD #7505

Closed sthen closed 4 years ago

sthen commented 5 years ago

Describe the bug

Following the merge of 844e821, startup doesn't complete on OpenBSD. It starts to initialize but doesn't enter normal main processing or respond normally to signals (no response to HUP or a first ^C, a second ^C exits uncleanly). f3fbac2 worked ok.

Discussed a bit in #7320 with dnsmichi ("Might be related to #3517, I remember that OpenBSD treats things differently with threads."), then I was waiting for boost-context parts to get committed and make it into packages and I forgot to open a new ticket, sorry..

OpenBSD environment setup

I can try things and report back but it's likely that it will be easier if someone knowing the code can setup a development environment on OpenBSD - hopefully this is a useful quickstart:

Expected behavior

icinga initializes and starts main processing.

Screenshots

# icinga2 daemon -x debug
[2019-09-17 20:52:22 +0100] information/cli: Icinga application loader (version: r2.11.0-rc1-1)
[2019-09-17 20:52:22 +0100] notice/cli: Spawning seemless worker process doing the actual work
[2019-09-17 20:52:22 +0100] notice/cli: Spawned worker process (PID 62778), waiting for it to load its config
[2019-09-17 20:52:22 +0100] information/cli: Loading configuration file(s).
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/icinga2.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/constants.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/zones.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/itl
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/command-icinga.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/plugins
<<snip>>
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/mainlog.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/notification.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/syslog.conf
[2019-09-17 20:52:22 +0100] notice/config: Ignoring explicit load request for library "db_ido_mysql".
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/api-users.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/app.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/commands.conf
<<snip>>
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/users.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /var/lib/icinga2/api/packages/_api/include.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /var/lib/icinga2/api/packages/_api/XXX-1506019874-1/include.conf
[2019-09-17 20:52:22 +0100] notice/ConfigCompiler: Compiling config file: /var/lib/icinga2/api/packages/_api/XXX-1506019874-1/../active.conf
[2019-09-17 20:52:22 +0100] information/ConfigItem: Committing config item(s).
[2019-09-17 20:52:22 +0100] debug/Timer: TimerThreadProc started.
[2019-09-17 20:52:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'DaemonUtility::LoadConfigFiles'
[2019-09-17 20:52:22 +0100] notice/Library: Loading library 'libmysql_shim.so'
[2019-09-17 20:52:22 +0100] notice/ApiListener: Updating cache: Config package '_api' has active stage 'XXX-1506019874-1'.
[2019-09-17 20:52:22 +0100] information/ApiListener: My API identity: XXX
[2019-09-17 20:52:22 +0100] notice/TlsUtility: Available TLS cipher list: ECDHE-ECDSA-AES256-GCM-SHA384 ECDHE-RSA-AES256-GCM-SHA384 ECDHE-ECDSA-CHACHA20-POLY1305 ECDHE-RSA-CHACHA20-POLY1305 ECDHE-ECDSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES256-SHA384 ECDHE-RSA-AES256-SHA384 ECDHE-ECDSA-AES128-SHA256 ECDHE-RSA-AES128-SHA256
[2019-09-17 20:52:22 +0100] debug/HostGroup: Assigning membership for group 'MYH' to host 'MYH-cam-122'
[2019-09-17 20:52:22 +0100] debug/HostGroup: Assigning membership for group 'MYH' to host 'MYH-cam-116'
[2019-09-17 20:52:22 +0100] debug/HostGroup: Assigning membership for group 'MYH-cams' to host 'MYH-cam-122'
<<snip>>
[2019-09-17 20:52:22 +0100] debug/ServiceGroup: Assigning membership for group 'ping' to service 'MYH-cam-90!ping4'
[2019-09-17 20:52:22 +0100] debug/ServiceGroup: Assigning membership for group 'ping' to service 'MYH-cam-89!ping4'
<<snip>>
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 SyslogLogger.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 3 UserGroups.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 148 Services.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 221 Notifications.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 ApiUser.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 3 ServiceGroups.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 235 CheckCommands.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 98 Dependencies.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 4 Users.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 3 Zones.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 Endpoint.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 73 Hosts.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-09-17 20:52:22 +0100] information/ConfigItem: Instantiated 13 HostGroups.
[2019-09-17 20:52:22 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-09-17 20:52:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'DaemonUtility::LoadConfigFiles'
[2019-09-17 20:52:22 +0100] notice/cli: Notifying umbrella process (PID 26642) about the config loading success
[2019-09-17 20:52:22 +0100] notice/cli: Waiting for the umbrella process to let us doing the actual work
[2019-09-17 20:52:32 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:52:32 +0100] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-09-17 20:52:32 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-09-17 20:52:32 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-09-17 20:52:42 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:52:52 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:02 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:12 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:22 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:32 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:42 +0100] notice/DbConnection: Updating programstatus table.
[2019-09-17 20:53:52 +0100] notice/DbConnection: Updating programstatus table.
^C[2019-09-17 20:54:02 +0100] notice/DbConnection: Updating programstatus table.
^C

$ pgrep -lf icinga2
62778 /usr/local/lib/icinga2/sbin/icinga2 daemon -x debug
39222 /usr/local/lib/icinga2/sbin/icinga2 daemon -x debug

Your Environment

icinga2 - The Icinga 2 network monitoring daemon (version: r2.11.0-rc1-1)

Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Unknown
  Platform version: Unknown
  Kernel: OpenBSD
  Kernel version: 6.6
  Architecture: amd64

Build information:
  Compiler: Clang 8.0.1
  Build host: symphytum.spacehopper.org

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /var/run/icinga2

Old paths (deprecated):
  Installation root: /usr/local
  Sysconf directory: /usr/local/share/examples
  Run directory (base): /var/run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/local/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid

OpenBSD/amd64 -current, boost 1.66.0

Disabled features: compatlog debuglog elasticsearch gelf graphite ido-pgsql influxdb opentsdb perfdata statusdata
Enabled features: api checker command ido-mysql livestatus mainlog notification syslog

not relevant I think, but 2.7.1 - doc, monitoring, test

[2019-09-17 21:32:24 +0100] information/cli: Icinga application loader (version: r2.11.0-rc1-1)
[2019-09-17 21:32:24 +0100] information/cli: Loading configuration file(s).
[2019-09-17 21:32:24 +0100] information/ConfigItem: Committing config item(s).
[2019-09-17 21:32:24 +0100] information/ApiListener: My API identity: sym.spacehopper.org
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 3 ServiceGroups.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 ApiUser.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 SyslogLogger.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 235 CheckCommands.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 Endpoint.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 3 Zones.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 3 UserGroups.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 4 Users.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 98 Dependencies.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 73 Hosts.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 221 Notifications.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 13 HostGroups.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2019-09-17 21:32:24 +0100] information/ConfigItem: Instantiated 148 Services.
[2019-09-17 21:32:24 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-09-17 21:32:24 +0100] information/cli: Finished validating the configuration file(s).

n/a

Backtraces

3 processes running:

(gdb) bt
#0  _thread_sys_recvmsg () at -:3
#1  0x000006157d8d817e in _libc_recvmsg_cancel (fd=11, msg=0x7f7ffffc6ab0, flags=0)
    at /usr/src/lib/libc/sys/w_recvmsg.c:27
#2  0x00000612f8d26961 in ProcessHandler() ()
#3  0x00000612f8ce9b7a in StartSpawnProcessHelper() ()
#4  0x00000612f8c75d22 in Main() ()
#5  0x00000612f8c7240b in main ()
(gdb) bt
#0  _thread_sys_nanosleep () at -:3
#1  0x000006157d8f328e in _libc_nanosleep_cancel (timeout=0x7f7ffffc6028, remainder=0x0)
    at /usr/src/lib/libc/sys/w_nanosleep.c:27
#2  0x000006157d91afd7 in usleep (useconds=<optimized out>) at /usr/src/lib/libc/gen/usleep.c:46
#3  0x00000612f8fd259d in StartUnixWorker(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) ()
#4  0x00000612f8fd0d13 in icinga::DaemonCommand::Run(boost::program_options::variables_map const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const ()
#5  0x00000612f8c760bc in Main() ()
#6  0x00000612f8c7240b in main ()
(gdb) thread apply all bt full

Thread 10 (thread 165635):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=72, clockid=0, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 9, tv_nsec = 998997998}
#2  _rthread_cond_timedwait (cond=0x61537cbe0b0, mutexp=0x612f9475eb8 <l_TimerCV>, abs=0x615dd4e6a78)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x61564ac9e00
        tib = 0x61505879e00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x61564ac9e00
        clock = 0
        seq = 72
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d6fc56 in boost::condition_variable::do_wait_until(boost::unique_lock<boost::mutex>&, timespec const&) ()
No symbol table info available.
#4  0x00000612f8d53cc1 in bool boost::condition_variable::timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000ll> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000ll> const&) ()
No symbol table info available.
#5  0x00000612f8d03fe2 in icinga::Timer::TimerThreadProc() ()
No symbol table info available.
#6  0x00000612f8dfcc29 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

---Type <return> to continue, or q <return> to quit---
Thread 9 (thread 486983):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=75, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6688527876192, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x61505877000
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 75
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 8 (thread 273043):
#0  futex () at -:3
No locals.
---Type <return> to continue, or q <return> to quit---
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=75, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6687356813376, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x6159fcaac00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 75
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 7 (thread 133941):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=76, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6689075298304, tv_nsec = 6689370575397}
---Type <return> to continue, or q <return> to quit---
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x61505878e00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 76
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 6 (thread 311611):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=75, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6687356813376, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
---Type <return> to continue, or q <return> to quit---
        tib = 0x6159fcaaa00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 75
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 5 (thread 130139):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=70, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6687356813376, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x6159fcaa400
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
---Type <return> to continue, or q <return> to quit---
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 70
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 4 (thread 484608):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=69, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6689776795680, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x61505879c00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
---Type <return> to continue, or q <return> to quit---
        seq = 69
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 3 (thread 606767):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=76, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6687356813376, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x61505877e00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 76
        _cantcancel = 0
        error = <optimized out>
---Type <return> to continue, or q <return> to quit---
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (thread 109020):
#0  futex () at -:3
No locals.
#1  0x000006157d87b615 in _twait (p=<optimized out>, val=75, clockid=3, abs=<optimized out>)
    at /usr/src/lib/libc/thread/synch.h:41
        rel = {tv_sec = 6689944895552, tv_nsec = 6689370575397}
#2  _rthread_cond_timedwait (cond=0x6151d308730, mutexp=0x615a5bc1438, abs=0x0)
    at /usr/src/lib/libc/thread/rthread_cond.c:106
        mutex = 0x615398e5ea0
        tib = 0x61505878c00
        mutex_count = 0
        canceled = <error reading variable canceled (Cannot access memory at address 0x0)>
        rv = <error reading variable rv (Cannot access memory at address 0x0)>
        self = 0x615398e5ea0
        clock = 3
        seq = 75
        _cantcancel = 0
        error = <optimized out>
#3  0x00000612f8d64368 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#4  0x00000612f8d63f02 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
No symbol table info available.
#5  0x00000612f8d74f47 in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() ()
No symbol table info available.
#6  0x00000612f8d74e85 in boost_asio_detail_posix_thread_function ()
No symbol table info available.
#7  0x00000615070f8161 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
        thread = <optimized out>
        retval = <optimized out>
#8  0x000006157d8e1af8 in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:77
No locals.
#9  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (thread 168554):
#0  _thread_sys_nanosleep () at -:3
No locals.
#1  0x000006157d8f328e in _libc_nanosleep_cancel (timeout=0x7f7ffffc6028, remainder=0x0)
    at /usr/src/lib/libc/sys/w_nanosleep.c:27
        _tib = 0x6151587c900
#2  0x000006157d91afd7 in usleep (useconds=<optimized out>) at /usr/src/lib/libc/gen/usleep.c:46
        rqt = {tv_sec = 0, tv_nsec = 200000000}
#3  0x00000612f8fd28bc in StartUnixWorker(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) ()
No symbol table info available.
#4  0x00000612f8fd0d13 in icinga::DaemonCommand::Run(boost::program_options::variables_map const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const ()
No symbol table info available.
#5  0x00000612f8c760bc in Main() ()
No symbol table info available.
#6  0x00000612f8c7240b in main ()
No symbol table info available.
(gdb)
sthen commented 5 years ago

Looking at the earlier problem I've tried disabling various features (ending up with only 'checker' enabled) but with no apparent change.

dnsmichi commented 5 years ago

Thanks for the heads up, I had forgotten about this 👍 Unfortunately we cannot delay tomorrow's 2.11 release once again, so we'll have to look into this specific problem in the coming weeks. Up until then, OpenBSD users need to wait (you can link them here).

I'll try my best with @Al2Klimov looking into it, and will coordinate future steps with @lippserd. Not before next week though.

sthen commented 5 years ago

Thanks, do let me know if you run into any problems getting to a state to attempt debugging, happy to help wherever I can. No hurry on my part and good luck with release!

Al2Klimov commented 5 years ago

I've got it compiling with https://community.icinga.com/t/building-icinga-2-on-openbsd.

Al2Klimov commented 5 years ago

I installed it like this:

git clone https://github.com/Icinga/icinga2.git
cd icinga2
mkdir build
cd build
cmake -DICINGA2_UNITY_BUILD=OFF "-DCMAKE_INSTALL_PREFIX=$(dirname "$(pwd)")/prefix" -DICINGA2_USER=vagrant -DICINGA2_GROUP=vagrant "-DICINGA2_PLUGINDIR=$(dirname "$(pwd)")/prefix/usr/lib/nagios/plugins" ..
make -j2
make -j2 install

... but it seems not to work:

-bash-5.0$ set -x
-bash-5.0$ prefix/sbin/icinga2 daemon -C
+ prefix/sbin/icinga2 daemon -C
prefix/sbin/icinga2[29]: /home/vagrant/icinga2/prefix/lib/icinga2/sbin/icinga2: Cannot allocate memory
-bash-5.0$

@sthen May there be anything openBSD-specific causing this error?

-bash-5.0$ vmstat
 procs    memory       page                    disks    traps          cpu
 r   s   avm     fre  flt  re  pi  po  fr  sr wd0 cd0  int   sys   cs us sy id
 2  50   26M   1371M  384   0   0   0   0   0  28   0   38   500   98  0  1 98
-bash-5.0$
Al2Klimov commented 5 years ago

Error seems to have gone away with a non-Vagrant VM.

Al2Klimov commented 5 years ago

... not.

sthen commented 5 years ago

Sorry for the slow reply .. do you need to raise ulimit -d?

sthen commented 5 years ago

huh, no it's not that... will have a poke, this seems weird

Al2Klimov commented 5 years ago

@sthen Thank you. Please let me know what steps I need to perform on top of my "I installed it like this: ..." once you've figured this out.

sthen commented 5 years ago

@Al2Klimov OK I'm not sure why the ENOMEM is occurring, but "strip lib/icinga2/sbin/icinga2" allows it to run ...

XXX bleh, scratch that, tested the wrong file :(

Al2Klimov commented 5 years ago

By the way: v2.11 uses signals for IPC, is there anything special in OpenBSD on this (which may cause the problem this issue is about)?

sthen commented 5 years ago

Kicking myself for not thinking of it sooner (it builds ok from the ports tree but fails building outside it - despite having the same cmake flags...) but I finally tracked down the ENOMEM - the binary works if you build it with -O2.

OpenBSD and signals - I'm not aware of anything special - I asked and was told "we do all signal functionality except for signal queueing"

dnsmichi commented 5 years ago

Meaning to say, the compiler optimization flags being changed allow Icinga to run like expected? I could imagine that this has something to do with the way Boost context is compiled/linked, we had a similar problem on Windows with optimizations.

https://github.com/Icinga/icinga2/blob/master/CMakeLists.txt#L117

sthen commented 5 years ago

No, sorry that was unclear - this purely relates to the problem @Al2Klimov had when building from git,

+ prefix/sbin/icinga2 daemon -C
prefix/sbin/icinga2[29]: /home/vagrant/icinga2/prefix/lib/icinga2/sbin/icinga2: Cannot allocate memory
Al2Klimov commented 5 years ago

@sthen -O2 works, thank you!

CFLAGS=-O2 CXXFLAGS=-O2 cmake -DICINGA2_UNITY_BUILD=OFF "-DCMAKE_INSTALL_PREFIX=$(dirname "$(pwd)")/prefix" -DICINGA2_USER=vagrant -DICINGA2_GROUP=vagrant "-DICINGA2_PLUGINDIR=$(dirname "$(pwd)")/prefix/usr/lib/nagios/plugins" ..

And I seem to be able to reproduce this issue.

Al2Klimov commented 5 years ago

@sthen I reproduced it:

The daemon's umbrella process hangs here while waiting for this one to be changed here as that function is registered here.

The signal delivery responsible for that mechanism works on both Linux and OSX, but it seems not to work on OpenBSD. Any idea? May OpenBSD be "too secure" at this point?

sthen commented 4 years ago

Big thanks to @jmatthew for tracking it down to pid not being filled in in siginfo_t (an old OpenBSD missing feature, https://marc.info/?l=openbsd-tech&m=120218016412546&w=2). He is looking at the kernel side but it's complicated to implement fully - can you think of any big problems as far as Icinga is concerned from relaxing the validation to permit either 0 or the expected pid?

Index: lib/cli/daemoncommand.cpp
--- lib/cli/daemoncommand.cpp.orig
+++ lib/cli/daemoncommand.cpp
@@ -317,6 +317,6 @@ static Atomic<bool> l_RequestedReopenLogs (false);
  */
 static void UmbrellaSignalHandler(int num, siginfo_t *info, void*)
 {
    switch (num) {
        case SIGUSR1:
            // Someone requested to re-open logs
@@ -324,14 +324,14 @@ static void UmbrellaSignalHandler(int num, siginfo_t *
            break;
        case SIGUSR2:
            if (l_CurrentlyStartingUnixWorkerState.load() == UnixWorkerState::Pending
-               && info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) {
+               && (info->si_pid == 0 || info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) ) {
                // The seemless worker currently being started by StartUnixWorker() successfully loaded its config
                l_CurrentlyStartingUnixWorkerState.store(UnixWorkerState::LoadedConfig);
            }
            break;
        case SIGCHLD:
            if (l_CurrentlyStartingUnixWorkerState.load() == UnixWorkerState::Pending
-               && info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) {
+               && (info->si_pid == 0 || info->si_pid == l_CurrentlyStartingUnixWorkerPid.load()) ) {
                // The seemless worker currently being started by StartUnixWorker() failed
                l_CurrentlyStartingUnixWorkerState.store(UnixWorkerState::Failed);
            }
@@ -366,16 +366,16 @@ static void UmbrellaSignalHandler(int num, siginfo_t *
  */
 static void WorkerSignalHandler(int num, siginfo_t *info, void*)
 {
    switch (num) {
        case SIGUSR2:
-           if (info->si_pid == l_UmbrellaPid) {
+           if (info->si_pid == 0 || info->si_pid == l_UmbrellaPid) {
                // The umbrella process allowed us to continue working beyond config validation
                l_AllowedToWork.store(true);
            }
            break;
        case SIGINT:
        case SIGTERM:
-           if (info->si_pid == l_UmbrellaPid) {
+           if (info->si_pid == 0 || info->si_pid == l_UmbrellaPid) {
                // The umbrella process requested our termination
                Application::RequestShutdown();
            }
Al2Klimov commented 4 years ago

Hello @sthen and thank you for the great news!

Does that patch actually fix the problem? If yes, please could the author open a PR?

Best, AK

sthen commented 4 years ago

Thanks @Al2Klimov - it does fix the problem, 2.11.2 with that patch has been running successfully for ~2 weeks. I wrote it so have submitted it as PR #7739