chef / omnibus-ctl

Provides service control for omnibus packages
Apache License 2.0
23 stars 24 forks source link

requires system-wide `runit` on Debian #53

Open matthiasr opened 7 years ago

matthiasr commented 7 years ago

Break-out from #52.

We use system-wide runit for various other daemons on our nodes. chef-server-ctl reconfigure works correctly, but to use chef-server-ctl restart or chef-server-ctl status we need to explicitly set SVDIR (which #52 will prevent).

$ sudo env -i chef-server-ctl status
fail: bookshelf: unable to change to service directory: file does not exist
fail: nginx: unable to change to service directory: file does not exist
fail: oc_bifrost: unable to change to service directory: file does not exist
fail: oc_id: unable to change to service directory: file does not exist
fail: opscode-erchef: unable to change to service directory: file does not exist
fail: opscode-expander: unable to change to service directory: file does not exist
fail: opscode-solr4: unable to change to service directory: file does not exist
fail: postgresql: unable to change to service directory: file does not exist
fail: rabbitmq: unable to change to service directory: file does not exist
fail: redis_lb: unable to change to service directory: file does not exist
$ sudo env -i SVDIR=/opt/opscode/service chef-server-ctl status
run: bookshelf: (pid 4816) 1724702s; run: log: (pid 4869) 1724702s
run: nginx: (pid 4723) 1724706s; run: log: (pid 4980) 1724698s
run: oc_bifrost: (pid 4656) 1724708s; run: log: (pid 4693) 1724707s
run: oc_id: (pid 4708) 1724706s; run: log: (pid 4713) 1724706s
run: opscode-erchef: (pid 4936) 1724700s; run: log: (pid 4917) 1724701s
run: opscode-expander: (pid 4777) 1724703s; run: log: (pid 4807) 1724703s
run: opscode-solr4: (pid 4737) 1724704s; run: log: (pid 4767) 1724704s
run: postgresql: (pid 4635) 1724708s; run: log: (pid 4647) 1724708s
run: rabbitmq: (pid 4538) 1724709s; run: log: (pid 4530) 1724710s
run: redis_lb: (pid 4473) 1724773s; run: log: (pid 4976) 1724698s

This is on Chef server 12.7 and 12.13 (same effect on both).

Digging into the code, the shell-out happens here. It does not call the embedded/bin/sv binary but wrapper scripts in init/, and these contain RUNIT=/usr/bin/sv which is the global binary, looking for the services in /etc/service.

I don't know why these wrapper scripts are necessary, how they are created, or whether they should be used at all?

matthiasr commented 7 years ago

cc @srenatus

stevendanna commented 7 years ago

@matthiasr Thanks for reporting this! The files in init/ shouldn't be wrapper scripts at all, but rather symlinks to the internal sv:

vagrant@api:~$ ls -al /opt/opscode/init/
total 8
drwxrwxr-x 2 root root 4096 Mar 27 15:08 .
drwxrwxr-x 8 root root 4096 Mar 27 15:05 ..
lrwxrwxrwx 1 root root   28 Mar 27 15:07 bookshelf -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 nginx -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 oc_bifrost -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 oc_id -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 opscode-chef-mover -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-erchef -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-expander -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 opscode-solr4 -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:07 postgresql -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:06 rabbitmq -> /opt/opscode/embedded/bin/sv
lrwxrwxrwx 1 root root   28 Mar 27 15:08 redis_lb -> /opt/opscode/embedded/bin/sv

If you are seeing those files as scripts, would you mind sharing:

  1. The content of the script
  2. The content of your chef-server.rb
  3. The version of chef-server you are running

There must be another piece to this puzzle.

matthiasr commented 7 years ago

The nginx one is

#!/bin/sh
### BEGIN INIT INFO
# Provides:          nginx
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop:
# Short-Description: initscript for runit-managed nginx service
### END INIT INFO

# Author: Chef Software, Inc. <cookbooks@chef.io>

PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="runit-managed nginx"
NAME=nginx
RUNIT=/usr/bin/sv
SCRIPTNAME=/etc/init.d/$NAME

# Exit if runit is not installed
[ -x $RUNIT ] || exit 0

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

case "$1" in
  start)
        [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC " "$NAME"
        $RUNIT start $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  stop)
        [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
        $RUNIT stop $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  status)
        $RUNIT status $NAME && exit 0 || exit $?
        ;;
  reload)
        [ "$VERBOSE" != no ] && log_daemon_msg "Reloading $DESC" "$NAME"
        $RUNIT reload $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  force-reload)
        [ "$VERBOSE" != no ] && log_daemon_msg "Force reloading $DESC" "$NAME"
        $RUNIT force-reload $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  restart)
        [ "$VERBOSE" != no ] && log_daemon_msg "Restarting $DESC" "$NAME"
        $RUNIT restart $NAME
        [ "$VERBOSE" != no ] && log_end_msg $?
        ;;
  *)
        echo "Usage: $SCRIPTNAME {start|stop|status|reload|force-reload|restart}" >&2
        exit 3
        ;;
esac

:

It's possible we did / are doing something wrong while upgrading between Chef Server versions …

matthiasr commented 7 years ago

The test server I'm investigating on is on package version 12.7.0-1, on Debian Jessie. I'm upgrading it to 12.13 now. I'll also try with a fresh install.

matthiasr commented 7 years ago

Upgrading from 12.7 to 12.13 using the proper procedure (stop, upgrade, start, cleanup) doesn't change anything about these files.

matthiasr commented 7 years ago

… and if I move aside one of the init scripts, then a chef-server-ctl reconfigure faithfully recreates it.

matthiasr commented 7 years ago

This seems to come from here. I understand why that is, we use an old hacked-up version of the runit cookbook ourselves and Debian is very unhappy about symlinks to binaries in /etc/init.d. It's just problematic that omnibus-ctl assumes otherwise …

so, this is only a problem on Debian, but it is a problem on Debian.

matthiasr commented 7 years ago

Is there a reason omnibus-ctl needs to execute init/<service> <command> rather than embedded/bin/sv <service> <command>?

stevendanna commented 7 years ago

We are currently on this version of the runit cookbook in chef-server, where the path to runit is hardcoded:

https://github.com/chef-cookbooks/runit/blob/v1.6.0/templates/debian/init.d.erb#L16

However, on newer versions, this appears to be configurable:

https://github.com/chef-cookbooks/runit/blob/master/templates/debian/init.d.erb#L16

So one option might be to upgrade the version of runit we are using.

Is there a reason omnibus-ctl needs to execute init/ rather than embedded/bin/sv ?

As long as we provide compatible output (as to not break any monitoring), I don't see a reason we couldn't use sv directly.

stevendanna commented 7 years ago

In the sort term, maybe the easiest thing is to have omnibus-ctl set SVDIR to the "correct" value

matthiasr commented 7 years ago

I'm pretty sure with the symlink sv itself behaves the same way, and produces the same output.

Yes, setting SVDIR "correctly" would work, but it still relies on the system runit actually being sane then. It probably will be, but it's still leaking in.

matthiasr commented 7 years ago

As expected from the runit_service provider, the same happens on a clean install, so it's not related to any upgrade issues.

matthiasr commented 7 years ago

I'm wondering, but have no quick way to check, whether this works at all on Debian unless a system-wide runit package is installed?

stevendanna commented 7 years ago

@matthiasr My assumption from the code is that it does not work on debian currently without runit installed on the system (outside of the chef-server package)

matthiasr commented 7 years ago

Quickly testing this in a minimal VM supports this. I think changing this to use the internal sv binary directly is easy enough, I'll try to make a PR for that. It's not the cleanest solutions – the broken init files still exist, but they would no longer be actually used.

matthiasr commented 7 years ago

Done now: #54

stevendanna commented 7 years ago

We've merged https://github.com/chef/omnibus-ctl/pull/55 which will hopefully keep debian working provided you have runit available on the system. I'm going to leave this open, however, since we still depend on system-installed runit.

matthiasr commented 7 years ago

I updated the title to reflect that the leaking in is what makes it work in the first place.