canonical / pebble

Take control of your internal daemons!
GNU General Public License v3.0
136 stars 51 forks source link

fix(checkstate): ignore and abort carryover changes #415

Closed flotter closed 2 months ago

flotter commented 2 months ago

Addresses https://github.com/canonical/pebble/issues/414.

Running checks have changes that get persisted by the state-engine. This means that following a reboot, changes and tasks that are not complete (not ready) will be resumed. In the case of some managers, such as checkstate, carryover changes and tasks are an unwanted side effect.

Currently the plan manager will perform the first plan load very early during startup (before the state-engine is ready, and before StartUp hooks have been called). The result is that a PlanChanged propagation will take place at plan load, and force checkstate to inspect the running checks prematurely.

Checkstate discovers a running change from a previous boot context, and tries to load its data from cached state, which does not exist.

Gracefully ignore changes for which cached state does not exist, and use this to identify changes that should be aborted on the first ensure pass.

Test:

<reboot>
59   Hold    today at 23:29 SAST  today at 23:37 SAST  Recover exec check "internet-online"
60   Error   today at 23:37 SAST  today at 23:37 SAST  Perform exec check "internet-online"
:
66   Doing   today at 23:37 SAST  -                    Recover exec check "internet-online"
<reboot>
:
66   Hold    today at 23:37 SAST  today at 23:41 SAST  Recover exec check "internet-online"
:
69   Error   today at 23:41 SAST  today at 23:42 SAST  Perform exec check "internet-online"
:
73   Doing   today at 23:42 SAST  -                    Recover exec check "internet-online"

As the test demonstrates, following a reboot, the change (66) is aborted, and as a result the status is now Hold. A new change is created which starts as a Perform, and then fails (as expected) after a while and changes to a Recover.