canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
146 stars 54 forks source link

[DO NOT MERGE] fix(checkstate): avoid holding check mutex when running action func #374

Closed benhoyt closed 3 months ago

benhoyt commented 7 months ago

DO NOT MERGE: I'm rewriting the health check manager to use changes and tasks, and those changes will likely make this issue go away (and conflict with this). Keeping this open until that's done, but this probably won't be merged.

To minimise lock contention and the possibility of deadlock (such as in function, which could essentially do anything, depending on the caller. It's not a problem now, but it's definitely the right thing to do.

To do this, I refactored the check state-modifying blocks into methods so we can use "Lock(); defer Unlock()". I also reversed the error of the error checking to the typical pattern of "if err != nil" instead of "if err == nil".

In addition, I've improved the tests in two ways:

1) Used mutexes when setting failureName, as the race detector is now picking this up (though I think in theory it was always a problem). 2) Added a test that we're actually setting actionRan, so we don't introduce a regression where we accidentally run the action on every failure.

benhoyt commented 3 months ago

Per PR description, the rewrite of checkstate has made this problem go away. Locks are no longer held when the failure handlers are being called: https://github.com/canonical/pebble/blob/22cdf7a75e97d506bc45fdf199d721f15b90d1e6/internals/overlord/checkstate/handlers.go#L80