NagiosEnterprises / ndoutils

NDOUtils - Database Output for Nagios Core
GNU General Public License v2.0
48 stars 21 forks source link

Solaris 11: cannot get ndo2db service to start when the -f argument is present #34

Closed box293 closed 7 years ago

box293 commented 7 years ago

In Solaris 11 I cannot get the ndo2db service to start (enabled) when the -f argument is present on the EXEC line.

In the steps below I enable the service and you can see it has a timeout.

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcs -xv ndo2db
svc:/network/nagios/ndo2db:default (NDO2DB daemon)
 State: disabled since May  8, 2017 05:22:18 PM EST
Reason: Disabled by an administrator.
   See: http://support.oracle.com/msg/SMF-8000-05
   See: http://www.nagios.org
   See: /var/svc/log/network-nagios-ndo2db:default.log
Impact: This service is not running.

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcadm enable ndo2db
root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcs -xv ndo2db
svc:/network/nagios/ndo2db:default (NDO2DB daemon)
 State: offline* transitioning to online since May  8, 2017 05:23:00 PM EST
Reason: Start method is running.
   See: http://support.oracle.com/msg/SMF-8000-C4
   See: http://www.nagios.org
   See: /var/svc/log/network-nagios-ndo2db:default.log
Impact: This service is not running.

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# tail /var/svc/log/network-nagios-ndo2db:default.log
[ May  8 17:23:00 Enabled. ]
[ May  8 17:23:00 Executing start method ("/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f"). ]
[ May  8 17:23:05 Method or service exit timed out.  Killing contract 236. ]
[ May  8 17:23:05 Method "start" failed due to signal KILL. ]
[ May  8 17:23:05 Executing start method ("/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f"). ]
Could not bind socket: Address already in use
[ May  8 17:23:05 Method "start" exited with status 1. ]
[ May  8 17:23:05 Executing start method ("/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f"). ]
Could not bind socket: Address already in use
[ May  8 17:23:05 Method "start" exited with status 1. ]

Ignore the "Could not bind socket: Address already in use" messages, these are not relevant to this issue and are explained in https://github.com/NagiosEnterprises/ndoutils/issues/33.

Now to disable the service and remove the ndo.sock file

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcadm disable ndo2db
root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# rm -f /usr/local/nagios/var/ndo.sock

Here is the EXEC line in the solaris-init.xml file:

exec='/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f'

Next I will remove the "-f" and re-install the service:

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# perl -p -i -e 's/cfg -f/cfg/g' startup/solaris-init.xml

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# gmake install-init
/usr/bin/ginstall -c -m 775 -g sys -d /lib/svc/manifest/network/nagios
/usr/bin/ginstall -c -m 644 startup/solaris-init.xml /lib/svc/manifest/network/nagios/ndo2db.xml
svccfg import /lib/svc/manifest/network/nagios/ndo2db.xml
svccfg: Restarting svc:/system/manifest-import
 The manifest being imported is from a standard location and should be imported with the  command : svcadm restart svc:/system/manifest-import
*** Run 'svcadm enable ndo2db' to start it

Now to enable the service:

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcadm enable ndo2db

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# svcs -xv ndo2db
svc:/network/nagios/ndo2db:default (NDO2DB daemon)
 State: online since May  8, 2017 05:31:01 PM EST
   See: http://www.nagios.org
   See: /var/svc/log/network-nagios-ndo2db:default.log
Impact: None.

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# tail /var/svc/log/network-nagios-ndo2db:default.log
[ May  8 17:30:25 Rereading configuration. ]
[ May  8 17:31:01 Enabled. ]
[ May  8 17:31:01 Executing start method ("/usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg"). ]
[ May  8 17:31:01 Method "start" exited with status 0. ]

root@core-041:/var/tmp/ndoutils-ndoutils-2.1.3# tail /usr/local/nagios/var/nagios.log  
[1494228647] ndomod: Still unable to reconnect to data sink.  0 items lost, 663 queued items to flush.
[1494228665] ndomod: Successfully reconnected to data sink!  0 items lost, 689 queued items to flush.
[1494228666] ndomod: Successfully flushed 689 queued items to data sink.

You can see that removing the "-f" allowed the servce to start.

jfrickson commented 7 years ago

This will be an easy fix. I thought for sure that smf would put it in the background. Guess not.

jfrickson commented 7 years ago

Fixed in branch maint via commit https://github.com/NagiosEnterprises/ndoutils/commit/06986e0dc68eddb5c90923a8be9ce9b69cd8efde