SeattleTestbed / nodemanager

Remote control server for SeattleTestbed nodes
MIT License
0 stars 10 forks source link

`daemon.py` hangs if `init` process has PID other than 1 #115

Closed aaaaalbert closed 9 years ago

aaaaalbert commented 9 years ago

Seattle has a daemon library that helps daemonize the nodemanager process on Unix-like systems. It does this using a variant of the usual "double-fork" approach, which can be summarized as follows (with some terminology borrowed from here:

Our implementation of the last bit is what gets us in trouble: The code waits for the init process to adopt child 2 after child 1 exited, and tries to detect this by waiting for its parent process ID (ppid) to become 1, init's usual PID.

However, Upstart, Ubuntu Linux's current init replacement, has a User Session Mode that results in multiple init processes running on the system, one "classical" (PID 1) and another one that spawns processes from interactions with the GUI (such as a terminal window), init --user, with a different PID. This makes our code loop indefinitely.

What we should do instead can be derived from the table below, showing an instrumented version of daemon.py going through the different phases of forking:

Ref What? PID PPID PGRP SID TTY
1 Parent process 26105 3643 26105 3643 /dev/pts/132
2 Child1, forked, but before setsid 26106 26105 26105 3643 /dev/pts/13
3 Child1, before closing stdstreams 26106 26105 26106 26106 /dev/pts/13
4 Child1, afterwards 26106 26105 26106 26106 EBADF
5 Child2, forked but waiting for Child1 to exit 26107 26106 26106 26106 EBADF
6 Child2, ready 26107 2957 26106 26106 EBADF
  1. shows the parent process, as started by bash which also is the session leader (i.e. the session ID is its PID). The parent started the process group, PGRP.
  2. shows Child 1, with the Parent process as its PPID, still within the same session and process group. The child calls setsid() now, resulting in...
  3. Child 1 is the process group leader, and also owns this new session.
  4. Child 1 has closed the standard file descriptors it inherited from Parent. Accessing them results in a "Bad File descriptor" error.
  5. shows Child 2 forked off. Its parent, session leader, and process group owner is Child 1.
  6. Child 1 has exited, leaving Child 2 to be adopted by init --user.

Fix for daemon.py: In order to wait for Child 1 to exit, Child 2 can check whether its PGRP or SID equals the PPID it sees. If not, Child 1 has exited, and Child 2 can continue.

46,47c46
<       ppid = os.getppid()
<       while ppid != 1:

---
>       while os.getppid() == os.getpgrp():
49d47
<         ppid = os.getppid()

I'm currently testing this patch on Mac OS X 10.6.8 with Python 2.7.8, and Ubuntu 14.04.1 with Python 2.7.6 running inside VirtualBox 4.3.20.

vladimir-v-diaz commented 9 years ago

Excellent issue description. I vote to transfer some of this informative text to daemon.py's docstring header; it's concise and easy to understand. :D

All that's missing is a comment explaining why a module might need to be daemonized (could it be to return a terminal to the user?). We should add a comment of this sort, here: https://github.com/SeattleTestbed/nodemanager/blob/master/nmmain.py#L482-L483

aaaaalbert commented 9 years ago

Thanks for the kind words :-)

Tests run fine on Mac and Linux! I'll get hold of a Windows box and test there, too.

aaaaalbert commented 9 years ago

Tested on Windows 7 -- works correctly.

I'll summarize my findings and send a pull request.

aaaaalbert commented 9 years ago

(I decided to write up what I learned over the last few days so I have a single reference in the future. Let me know what parts of it you think should go into the docstring, if any :-) )


Creating a daemon process

or: Why are we doing this convoluted thing again?

Goals of a daemon process:

Notes:


  1. We start with the parent process that wants to create a daemonized copy of itself. Let's assume we started Parent in an interactive shell.
    • Parent process: PID=parentID, PPID=shellID, PGRP=parentID, SID=shellID
  2. Parent calls fork() to fork off Child 1. The parent process wait()s for Child 1 to exit.
    • Child 1: PID=child1ID, PPID=parentID, PGRP=parentID, SID=shellID
    • (Child 1 has the parent process as its parent, shares its process group, and is in the shell's session.)
  3. We are not done yet , because Parent is wait()ing for Child 1 to terminate.
  4. Child 1 now calls setsid(), creating a new session, becoming its leader, and also becoming the process group leader. (Its leadership will become important only after the next fork(), see below).
    • Child 1: PID=child1ID, PPID=parentID, PGRP=child1ID, SID=child1ID
  5. (Depending on the requirements of the implementation, Child 1 should also close all of the file descriptors it inherited, chdir into /, and set its umask to 0. Alternatively, this might be done in Child 2 instead.)
  6. We are not done yet because
    • Parent is still wait()ing for Child 1, so if Child 1 would continue to run, this would keep Parent alive too.
    • Child1 is the leader of the new session, and can thus reacquire the controlling terminal even if it closed the file descriptors that it inherited from Parent. We specifically wanted to make this impossible for the daemon process.
  7. Thus, Child 1 calls fork() itself, creating Child 2 which is neither the process group nor session leader, and therefore cannot reacquire the controlling terminal. Note that Parent does not wait() for Child 2, as this is a grand-child.
    • Child 2: PID=child2ID, PPID=child1ID, PGRP=child1ID, SID=child1ID
  8. Following the fork, Child 1 exits. This leaves Child 2 without a parent for a moment, but an init process will adopt it soon. The consequence of Child 1's exit is that Parent can exit now, too. Eventually, we are left with only Child 2 which is now a daemon:
  9. Child 2: PID=child2ID, PPID=initID, PGRP=child1ID, SID=child1ID

Note that in contrast to traditional lore, the process ID of the init process (initID above) is not necessarily 1. Upstart (and possibly other init replacements) has init --user processes with different PIDs for graphical sessions aka "User Session Mode".

Further reading:

Code samples:

vladimir-v-diaz commented 9 years ago

I think the following sections should be included in daemon.py's docstring header:

  1. Goals of a Daemon Process.
  2. Text that begins, "We start with the parent process" and ending immediately before section Further Reading.

Include a link to this Github issue and/or a separate Wiki document that contains the rest of the informative bits about daemonic processes.

aaaaalbert commented 9 years ago

Implemented as suggested by @vladimir-v-diaz in #116.