Closed benhoyt closed 3 months ago
Per PR description, the rewrite of checkstate has made this problem go away. Locks are no longer held when the failure handlers are being called: https://github.com/canonical/pebble/blob/22cdf7a75e97d506bc45fdf199d721f15b90d1e6/internals/overlord/checkstate/handlers.go#L80
DO NOT MERGE: I'm rewriting the health check manager to use changes and tasks, and those changes will likely make this issue go away (and conflict with this). Keeping this open until that's done, but this probably won't be merged.
To minimise lock contention and the possibility of deadlock (such as in function, which could essentially do anything, depending on the caller. It's not a problem now, but it's definitely the right thing to do.
To do this, I refactored the check state-modifying blocks into methods so we can use "Lock(); defer Unlock()". I also reversed the error of the error checking to the typical pattern of "if err != nil" instead of "if err == nil".
In addition, I've improved the tests in two ways:
1) Used mutexes when setting failureName, as the race detector is now picking this up (though I think in theory it was always a problem). 2) Added a test that we're actually setting actionRan, so we don't introduce a regression where we accidentally run the action on every failure.