Behavior of `reject` and `accept` in multi-component updates

athoelke commented 2 years ago

Multi-component updates are discussed in #12. In particular, the issue with the initial API proposal and an update that has a cyclic version dependency between two (or more) components.

Another area that requires clear specification is the behavior of accept and reject when multiple components are being updated. For accept this only applies to TRIAL mode, but for reject this applies to both STAGED and TRIAL mode.

Current v0.7 specification

psa_fwu_accept()

Indicates to the implementation that the upgrade was successful. This changes the image state of a firmware image, and its dependencies, from PSA_IMAGE_PENDING_INSTALL to PSA_IMAGE_INSTALLED.

psa_fwu_request_rollback()

Requests the platform to roll back the firmware belonging to the caller and any other image that is dependent on that firmware.

The interaction with version dependencies is essential when multiple components are updated, to ensure that the system is not left in a state where a dependent component is accepted, but the component it depends on is not. Following this, the system may no longer operate correctly, and a firmware update implementation with rollback-prevention would not be able to recover to the previous operational installed system.

Analysis

The current specification requires that implementations can process component version dependencies. This might not be possible for a simple HAL-type implementation (see #16).

A HAL-type implementation depends on the Client providing correct image updates and ensuring that version dependencies are met. It would be reasonable to also require the Client to correctly reject (or accept) a group of inter-dependent updates. Although there is a risk of device failure if the Client does this incorrectly, this is not a novel impact for the accept and reject operations: install on a multi-component system relies on the Client to correctly manage version dependencies.

For implementations that can process version dependencies, e.g. for verification purposes, providing the specified behavior is additional complexity as it requires iterating through chains of dependent components.

Proposal

The proposed change to the API in #12 to make multi-component updates explicit in the API, using finish and install, results in component updates being grouped, either in a single set of STAGED components, or collection of such sets, where each set must be installed (or failed) atomically.

To reduce the implementation complexity for reject and accept, I propose that these operations act on an entire set of updates.

At the moment, reject and accept are parameterized by a component id. One approach would be to have these operations act on all of the components that were concurrently installed (install called when they were CANDIDATE state). Calling reject on components in WRITING or CANDIDATE state only affects the single component.

For simpler implementations, that do not maintain multiple installation-sets (see this comment to #12), then reject or accept on a component in STAGED or TRIAL state will act on all components in the same state.

Simpler alternative

We could make the API simpler, and have reject and accept take no parameter and operate on all TRIAL/STAGED components.

If this is preferred, then we should introduce a cancel operation for discarding updates that have been prepared (WRITING or CANDIDATE state) that still takes a component id.

athoelke commented 2 years ago

If we permit an implementation to maintain more than one set of updates to install atomically, then I think we must require that these sets are fully independent of each other. This removes a requirement (in the spec or the implementation) for accept or reject to affect more than a single such set.

athoelke commented 2 years ago

Working through the implications of the alternatives, to keep the implementation and usage of this capability as simple as possible, the best design is to only permit a single set of updates to be in the process of atomic installation. This is reflected in the latest comments to #12, and an updated version of the state model RFC in #1.

athoelke commented 2 years ago

Proposal

This resolution of this issue is now included with #12

ARM-software / psa-firmware-update-spec