Behavior on error - Githubissues

athoelke commented 2 years ago

§7.1 Behavior on error states the following:

All function calls must be implemented atomically:

When a function returns a type other than psa_status_t, the requested action has been carried out.

When a function returns the status PSA_SUCCESS or PSA_SUCCESS_xxx, the requested action has been carried out.

When a function returns another status of type psa_status_t, no action has been carried out.

These requirements seem to be too strict to comply with for all API calls. This is not always possible to implement efficiently for operations that can modify the flash, such as psa_fwu_write(), on the flash file systems used in constrained devices for storing firmware.

These requirements also contradict the description of the installer role in §3.3 Installer, which states:

When an image fails installation, it is referred to as a rejected image. If the severity of the failure is recoverable, the implementation may choose to turn the rejected image into a candidate image again. A rejected image might be marked as invalid and should be erased or overwritten.

Questions

Should the specification relax the requirements for Behavior on error entirely? - or permit individual APIs to state when atomicity is not required?
Should the specification require that psa_fwu_install() leaves the component in a WRITING (previously CANDIDATE) state on any error? - or are some errors significant enough that the Service should mark the image as FAILED (previously REJECTED), to prevent any further attempt to install?
If psa_fwu_install() can change the component state to FAILED, should the specification require that this must only be for unrecoverable errors, such as PSA_ERROR_WRONG_DEVICE?

d3zd3z commented 2 years ago

I haven't really thought much about the atomicity requirements of the writing of the image. I have extensively thought about it for the bootloaders process for installing the images, especially in regards to swap.

We have the advantage here that it is always possible to erase everything and start over again (this is generally not possible for the bootloader's swap). There may be tradeoffs involved that might have different answers depending on the configuration.

For example, what if power is lost during a write? Some flash devices, it will be possible to just do the write again when we start up again. But, we will have to determine if that is possible. Other devices, the sector will have to be erased, which would entail backing up in the upgrade.

As mentioned today, during a slow download, the writes are a fairly small part of that, and it may be meaningful to just backup to the last erase. But again, in these environments, the downloads are also quite slow.

An extreme example of this would be a device, such as a water meter, or a mars rover, where power is limited, and so is bandwidth. Sending parts of an image multiple times maybe exceed the power budget available.

bulislaw commented 2 years ago

Does it make sense for us to require the update client to keep track of successful writes if they care? If client issues a "write" at a given offset and for some reason the API returns an error or doesn't return (eg. reboot) it should be possible to continue from the last successful write. This doesn't require the write to be atomic as in most cases we don't care what's the content of memory we are writing to. This may be more mundane problem as flash may require to be erased before writing, but the update service backend should probably do it for each block anyway by default while writing.

athoelke commented 2 years ago

Does it make sense for us to require the update client to keep track of successful writes if they care?

I think so. This is my preferred approach as well. I don't know if we need to expose whether or not the firmware store can handle an overwrite like this via a component info flag?

This may be more mundane problem as flash may require to be erased before writing, but the update service backend should probably do it for each block anyway by default while writing.

This presumes that the Client transmits the data in blocks that are perfectly aligned to the erase blocks. I don't think this is valid assumption, because the specification does not require this to be true. The API design at present permits the client to transmit the data in arbitrary sized buffers in an arbitrary order, including overlapped regions. Although the PSA_FWU_MAX_BLOCK_SIZE values suggests (as demonstrated in the sample code) that the data should be sent in blocks of exactly this size, aligned on that sized boundaries: this is not required in the API definition.

For some applications, the energy and bandwidth constraints might require that the implementation accept small data blocks in arbitrary order. So I am not sure that we should specify stricter requirements on this API.

athoelke commented 2 years ago

Proposal for v1.0

The Behavior on Error section will be reworked. In general the need for atomic behavior is accepted, but the wording will allow specific APIs to indicate when other behavior can occur in error conditions.

The specification would benefit from a longer discussion on the ability (or even necessity) for some implementations to allow writes to be reattempted after certain error conditions or interruptions.

ARM-software / psa-firmware-update-spec

Behavior on error #14

Questions

Proposal for v1.0