Closed icinga-migration closed 12 years ago
Updated by mfriedrich on 2012-04-04 09:15:29 +00:00
the program does not react to SIGTERM anymore, SIGKILL is the only way to stop it.
# gdb /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db...done.
(gdb) set follow-fork-mode child
(gdb) set args -c /etc/nagios/ido2db.cfg
(gdb) run
Starting program: /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db -c /etc/nagios/ido2db.cfg
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
[Thread debugging using libthread_db enabled]
[New process 9269]
[Thread debugging using libthread_db enabled]
[New process 9270]
[Thread debugging using libthread_db enabled]
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
could be an os problem as well, as this box was recently updated to rhel 5.8 too.
we had that futex problem a while back with livestatus on the core itsself. https://dev.icinga.org/issues/786
the problem is the parent waiting for a child to terminate, which hangs somewhere - e.g. in localtime, possibly not being threadsafe???
the only occurence of localtime is in module/idoutils/src/logging.c which could possibly lead into a lock on the logfile not being returned properly? or is this the wrong direction to go?
Updated by mfriedrich on 2012-04-04 11:11:28 +00:00
more of that, with some kind words as well http://lists.op5.com/pipermail/op5-users/2010-November/001554.html
using localtime_r instead in debug logging solves it not entirely. reverting #2272 solves the problem, seems that vasprintf implementation does not work.
Updated by mfriedrich on 2012-04-04 11:19:04 +00:00
reverting the change of #2271 does work. but in order to stay even more safe, we'll re-use the soft locking method of the core in base/logging.c trying to get a lock on the debug file ptr and bail out early if not possible.
this will prevent us from further locks in there, and drop the futex on the parent waiting for the first child to die at first stage (grandchild as session leader - that's the key after the 2 forks on startup).
Updated by mfriedrich on 2012-04-04 11:46:01 +00:00
vasprintf and local char* buf is the root cause, re-changing to vfprintf makes it work again.
Updated by mfriedrich on 2012-04-04 12:24:07 +00:00
Updated by Tommi on 2012-04-04 18:36:48 +00:00
i never saw similar on my centos x64 box or in solaris, which is both running for weeks with vasprintf enabled. Looks like not to be a general issue. concurrent file access might be really a issue, but the buffer things... , hmm. Hope i can test this new version on solaris before my vacations next week (and the general replacement of v/a/snprintf functions on solaris will work as expected)
Updated by mfriedrich on 2012-04-04 22:34:17 +00:00
as stated in the other issue, changing this again is a nogo. look into the core logging.c where the exact same is done with vfprintf. why? because code is safe not passing null pointers to log functions. maybe idoutils will use a shared icinga library with logging scaffolds - not really the goal to keep things different in common projects.
Updated by mfriedrich on 2012-04-19 13:07:17 +00:00
for now this works here.
Updated by mfriedrich on 2014-12-08 14:35:58 +00:00
This issue has been migrated from Redmine: https://dev.icinga.com/issues/2500
Created by mfriedrich on 2012-04-04 08:04:52 +00:00
Assignee: mfriedrich Status: Resolved (closed on 2012-04-19 13:07:17 +00:00) Target Version: 1.7 Last Update: 2014-12-08 14:35:58 +00:00 (in Redmine)
rhel 5.8 x64
for some reason, starting ido2db via initscript hangs, but ido2db gets idomod connected and gets operational. it seems that on fork, it does not cleanly free ressources.
this is a new behaviour. did not have that with 1.6.x
a previously fully killed ido2db brings that on startup
possible that this has something to do with the oci pre-initializer to get the version info?
doing a killall -9 ido2db results in this.
that /sbin/service output can be remove by just invoking the init script, where the exact same happens then.
pinned down to manual start
hangs ...
debug log looks rather normal in that case.
Changesets
2012-04-04 11:21:20 +00:00 by mfriedrich 07d2f27740a277b951acf83252d8f2464a1c2117
Relations: