canonical / operator

Pure Python framework for writing Juju charms
Apache License 2.0
245 stars 119 forks source link

Clarity around `add_status`, `collect_app_status` and `collect_unit_status` regarding `ErrorStatus` and `UnknownStatus` #1356

Closed james-garner-canonical closed 1 month ago

james-garner-canonical commented 1 month ago

While improving the type checking and documentation of the status types that can be set by charms (#1354), we noticed an incongruity in how error and unknown status are treated. While they will both raise errors if used to set the unit or app status (and ErrorStatus and UnknownStatus document this), the documentation around add_status doesn't clearly reflect this.

... the add_status method accepts both error and unknown, and the doc lists those. It reads very much like you can add those statuses, but especially for error if you include that then it'll definitely fail because it will "win" the evaluation and then can't be set.

@tonyandrewmeyer in https://github.com/canonical/operator/pull/1354#pullrequestreview-2278753728

We could just add a bit more documentation here, but it seems peculiar that calling add_status(ErrorStatus('message') doesn't raise an immediate error -- instead it's guaranteed to raise an error at status collection time, as the status is internally set to the highest priority status (and error is the highest priority). Of course the semantics are different, and perhaps you could use this as a sort of deferring of the exception till after other handlers have had a chance to do something, but it feels off.

It seems a bit more legitimate to call add_status(UnknownStatus('message')), since another event callback could add a higher priority status (unknown is the lowest priority), and no error would be raised, but still seems like a strange pattern.

Looking at the 150+ charms, I can find add_status being called a reasonable amount with the settable status types (ActiveStatus, BlockedStatus, MaintenanceStatus, WaitingStatus), but never with ErrorStatus or UnknownStatus.

Perhaps this is a good opportunity to make it an error to call add_status with ErrorStatus or UnknownStatus?

Having said that, the inability to set ErrorStatus makes me wonder what the current best practice is for a charm to give up and require manual intervention, and if documentation could be improved there?

tonyandrewmeyer commented 1 month ago

Perhaps this is a good opportunity to make it an error to call add_status with ErrorStatus or UnknownStatus?

I think so. My guess is that this was simply missed when collect-status was added, so it simply included all statuses and how they would be ranked, without considering that it's not actually possible to set error or unknown.

Having said that, the inability to set ErrorStatus makes me wonder what the current best practice is for a charm to give up and require manual intervention, and if documentation could be improved there?

If manual intervention is required, then blocked or maintenance statuses make sense, depending on where the issue is. The admin should be notice that, and take the appropriate action.

Issues that may solve themselves (like a service outside of Juju being down) are less clear at the moment. Charmers don't like using error status (ie. raise an exception, which will result in an error status implicitly set) but it does seem appropriate to me. This ties in a bit with defer and a Juju-native defer, because another option is to defer and try again when the next Juju event arrives (which is somewhat unpredictable).

james-garner-canonical commented 1 month ago

I think so. My guess is that this was simply missed when collect-status was added, so it simply included all statuses and how they would be ranked, without considering that it's not actually possible to set error or unknown.

Makes sense. I'll aim to open a PR for this at some point.

Issues that may solve themselves (like a service outside of Juju being down) are less clear at the moment.

I'm wondering why WaitingStatus wouldn't be a good choice here? I guess it's supposed to be when waiting for things inside Juju -- other charms that this charm is integrated with.

There's a certain irony in the fact that you can't set your own status to error, so if you try to do so peacefully, you'll raise and exception and end up in an error state the old fashioned way. That is, setting your own state to error is forbidden, but if you try to do so, you will end up in that state.

another option is to defer and try again when the next Juju event arrives (which is somewhat unpredictable).

I guess just waiting/sleeping wouldn't be advisable here, but it makes me wonder if ops could provide some sort of way of doing this safely, like "rerun this event after some time, or on the next Juju event, whichever happens first"

tonyandrewmeyer commented 1 month ago

I'm wondering why WaitingStatus wouldn't be a good choice here? I guess it's supposed to be when waiting for things inside Juju -- other charms that this charm is integrated with.

Yes. I don't think it's the most intuitive name, but that's the intention - there was a discussion at the Madrid sprint about this, because there was some confusion previously, but we should be consistent now.

There's a certain irony in the fact that you can't set your own status to error, so if you try to do so peacefully, you'll raise and exception and end up in an error state the old fashioned way. That is, setting your own state to error is forbidden, but if you try to do so, you will end up in that state.

Yes :).

another option is to defer and try again when the next Juju event arrives (which is somewhat unpredictable).

I guess just waiting/sleeping wouldn't be advisable here, but it makes me wonder if ops could provide some sort of way of doing this safely, like "rerun this event after some time, or on the next Juju event, whichever happens first"

That's the intention of the Juju-native/first-class defer. However, it's not yet clear whether there really is a use-case for that or not.