canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
146 stars 54 forks source link

feat(daemon): add Err() to retrieve the tomb death reason #453

Closed thp-canonical closed 2 months ago

thp-canonical commented 2 months ago

Daemon has a .Dying() method that mirrors the Tomb.Dying() API, but not .Err() for the Tomb.Err() API. Add a .Err() pass-through function and print out the server exit reason for easier debugging should one of the daemon tomb goroutines die.

benhoyt commented 2 months ago

Looks reasonable to me. However, I'm wondering how this can happen. Currently it looks like the daemon mostly calls d.tomb.Kill(nil) with a nil death reason. Do you have a repro?

thp-canonical commented 2 months ago

Looks reasonable to me. However, I'm wondering how this can happen. Currently it looks like the daemon mostly calls d.tomb.Kill(nil) with a nil death reason. Do you have a repro?

(Edit:) There's one kill occurrence here, but granted at that point we already decided to shut down:

https://github.com/canonical/pebble/blob/02e5d2d72e53413d0ac7ed78c41240d5a6f1ed44/internals/daemon/daemon.go#L567

I think it also happens when one of the tomb goroutines exits with an error, right? From the Tomb.Go() docs:

If f returns a non-nil error, t.Kill is called with that error as the death reason parameter.

So any errors returned from here would (I think) also be reported, not just .Kill() calls:

https://github.com/canonical/pebble/blob/02e5d2d72e53413d0ac7ed78c41240d5a6f1ed44/internals/daemon/daemon.go#L469-L474

Same for this related place:

https://github.com/canonical/pebble/blob/02e5d2d72e53413d0ac7ed78c41240d5a6f1ed44/internals/daemon/daemon.go#L479-L485

With the documentation for Tomb.Kill() saying:

Althoguh (sic) Kill may be called multiple times, only the first non-nil error is recorded as the death reason.

So the && d.tomb.Err() == tomb.ErrStillAlive checks above are probably not needed (if their intention is only to avoid overwriting an existing error).

I haven't followed if there's a way to bubble up errors from some handlers all the way to the caller of the .Serve() function, but it seems like it at least any error return value from any of the .Serve() calls that's != http.ErrServerClosed would be good to have reported instead of silently ignored.