canonical / pebble

Take control of your internal daemons!
GNU General Public License v3.0
137 stars 52 forks source link

pebble hangs when it is killed while executing a command #371

Closed weiiwang01 closed 4 months ago

weiiwang01 commented 4 months ago

When the pebble daemon is in the process of executing a command and receives a SIGTERM signal, it hangs rather than shutting down gracefully.

To reproduce this problem:

./pebble run

# Execute a command in pebble, then terminate the pebble daemon during the command's execution
./pebble exec sleep 60
kill <pebble-daemon-pid>

# pebble daemon hangs, even after the command has finished executing

I think this happens because the reaper.Stop function stops the reaper loop without terminating and waiting for all child processes to finish and be reaped. This causes tasks like exec relying on reaper.WaitCommand to wait forever, as they depend on the reaper loop to send the process's exit code to proceed.

I created a patch to kill all child process managed by the reaper and wait for them to exit, which seems to fix this issue.

diff --git a/internals/reaper/reaper.go b/internals/reaper/reaper.go
index fb1d7b6..f211ffd 100644
--- a/internals/reaper/reaper.go
+++ b/internals/reaper/reaper.go
@@ -21,6 +21,7 @@ import (
    "os/exec"
    "os/signal"
    "sync"
+   "syscall"

    "golang.org/x/sys/unix"
    "gopkg.in/tomb.v2"
@@ -67,6 +68,23 @@ func Stop() error {
    }
    mutex.Unlock()

+   // kill all child process and wait them to die
+   mutex.Lock()
+   for len(pids) > 0 {
+       for p, ch := range pids {
+           syscall.Kill(p, syscall.SIGKILL)
+           mutex.Unlock()
+           // waiting for reaper finishing its job on this process
+           code, ok := <-ch
+           if ok {
+               ch <- code
+           }
+           mutex.Lock()
+           break
+       }
+   }
+   mutex.Unlock()
+
    reaperTomb.Kill(nil)
    reaperTomb.Wait()
    reaperTomb = tomb.Tomb{}
@@ -201,6 +219,7 @@ func WaitCommand(cmd *exec.Cmd) (int, error) {

    // Wait for reaper to reap this PID and send us the exit code.
    exitCode := <-ch
+   close(ch)

    // Remove PID from waits map once we've received exit code from reaper.
    mutex.Lock()
benhoyt commented 4 months ago

Thanks @weiiwang01, especially for the code change suggestion. I believe this is a duplicate of #163. Can you please confirm, and if so, close this and copy your comments/suggestion there?

weiiwang01 commented 4 months ago

Yes, sorry for the duplicate issue.