espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.84k stars 7.32k forks source link

Invalidate specific OTA slot. (IDFGH-13982) #14808

Open ddomnik opened 3 weeks ago

ddomnik commented 3 weeks ago

Is your feature request related to a problem?

Let's assume we have two valid OTA partitions (ota0 and ota1) and both ota_state's are set to ESP_OTA_IMG_VALID. Now ota1 partition gets erased and we perform this incomplete update sequence:

This would assume we have a valid image in ota1, even tho the update sequence has not been completed.

The same issue would appear if a factory partition is used to reflash ota0 that previously was marked as valid.

Describe the solution you'd like.

esp_ota_begin() should mark the to be flashed partition as ESP_OTA_IMG_INVALID. To be more flexible and also to be able to mark specific partitions as invalid a generic function like these would be better:

esp_ota_mark_app_invalid(const esp_partition_t *partition);

esp_ota_set_state(const esp_partition_t *partition, esp_ota_img_states_t ota_state);

Describe alternatives you've considered.

  1. Rewriting the ota_data partition manually, but this seems to be hacky.
  2. Erase the partition content, but then the ota_state flag is still valid and the bootloader tries to load it.

Additional context.

The logic of the OTA data is quite complex to me and I am not even sure if ota_data actually behaves like this. As I think not every ota partition has a dedicated slot where the ota_state is stored? If so I may makes sense to make a function available like esp_ota_mark_app_invalid_rollback_and_reboot without the rollback and reboot part.

mahavirj commented 6 hours ago

@ddomnik

OTA data partition contains 2 identical flash sectors each storing copies of esp_ota_select_entry_t otadata[2] data structure. Sequence number (ota_seq) and the state (ota_state) are 2 important fields in this data structure that helps to define the active partition. Following code should help to clear the logic behind the usage of sequence number:

https://github.com/espressif/esp-idf/blob/f420609c332fbd2d2f7f188c6579d046c9560e42/components/app_update/esp_ota_ops.c#L490-L504

API esp_ota_set_boot_partition is responsible to correctly update both the fields ota_seq and ota_state in the OTA data partition. If there is a failure in calling this API then the new partition won't be activated yet and hence state (valid or invalid) does not really matter.

May I ask if you are observing any problem due to incorrect state of the image in the OTA data partition? If yes, could you please supply the detailed console logs here?

PS: you may use idf.py read-otadata to read out the OTA data partition on the device