espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.49k stars 7.26k forks source link

`esp_ota_get_state_partition` returns `ESP_ERR_NOT_FOUND` after 2OTA updates in a row for the running partition (IDFGH-13839) #14688

Open ermacv opened 2 days ago

ermacv commented 2 days ago

Answers checklist.

IDF version.

v5.3.1

Espressif SoC revision.

ESP32-S3

Operating System used.

Linux

How did you build your project?

Command line with CMake

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board

Power Supply used.

USB

What is the expected behavior?

esp_ota_get_state_partition should always return correct partition state for running partition.

What is the actual behavior?

esp_ota_get_state_partition returns ESP_ERR_NOT_FOUND after 2OTA updates in a row for the running partition

Steps to reproduce.

Use default IDF OTA update functions to update the FW: esp_ota_begin esp_ota_write esp_ota_end esp_ota_set_boot_partition

You need to write the FW two times without device restart. On third time esp_ota_get_state_partition starts to return ESP_ERR_NOT_FOUND on every call when running partition is provided.

There are 2 OTA partitions on my device. I added some logs to esp_ota_get_state_partition function to check what happens and it seems that OTA Data is fully rewritten and on third time the function cannot find provided running partition in the OTA Data. This is where I added esp_log:

    esp_ota_select_entry_t otadata[2];
    int ota_app_count = esp_ota_get_app_partition_count();
    ESP_LOGW(TAG, "read_otadata(otadata) = %p", read_otadata(otadata));
    ESP_LOGW(TAG, "ota_app_count = %d", ota_app_count);
    if (read_otadata(otadata) == NULL || ota_app_count == 0) {
        return ESP_ERR_NOT_FOUND;
    }

    int req_ota_slot = partition->subtype - ESP_PARTITION_SUBTYPE_APP_OTA_MIN;
        ESP_LOGW(TAG, "req_ota_slot = %d", req_ota_slot);
    bool not_found = true;
    for (int i = 0; i < 2; ++i) {
        ESP_LOGW(TAG, "otadata[%d].ota_seq = %lu", i, otadata[i].ota_seq);
        int ota_slot = (otadata[i].ota_seq - 1) % ota_app_count;
        ESP_LOGW(TAG, "ota_slot = %d", ota_slot);
        ESP_LOGW(TAG, "otadata[%d].crc = 0x%lx", i, otadata[i].crc);
        ESP_LOGW(TAG, "bootloader_common_ota_select_crc(&otadata[%d]) = 0x%lx", i, bootloader_common_ota_select_crc(&otadata[i]));
        if (ota_slot == req_ota_slot && otadata[i].crc == bootloader_common_ota_select_crc(&otadata[i])) {
            *ota_state = otadata[i].ota_state;
            not_found = false;
            break;
        }
    }

    if (not_found) {
        ESP_LOGW(TAG, "not_found = %s", not_found ? "true" : "false");
        return ESP_ERR_NOT_FOUND;
    }

In log please check the esp_ota_ops prefix.

Debug Logs.

I (16798) app:acs:fw_updater: Found 2 partitions. Version: 'none'
I (16798) app:acs:fw_updater: Found Firmware Updater partition type: 'Application'. Size: 732497 bytes
W (16798) plat_update: running_partition = 0x3c104000
W (16798) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (16798) esp_ota_ops: ota_app_count = 2
W (16798) esp_ota_ops: req_ota_slot = 0
W (16808) esp_ota_ops: otadata[0].ota_seq = 1
W (16808) esp_ota_ops: ota_slot = 0
W (16808) esp_ota_ops: otadata[0].crc = 0x4743989a
W (16808) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x4743989a
W (16808) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (16808) esp_ota_ops: ota_app_count = 2
W (16808) esp_ota_ops: req_ota_slot = 0
W (16808) esp_ota_ops: otadata[0].ota_seq = 1
W (16808) esp_ota_ops: ota_slot = 0
W (16808) esp_ota_ops: otadata[0].crc = 0x4743989a
W (16808) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x4743989a
I (19488) plat_update: Found Update Header (version: 1). Encryption type: 0, compression type: 1
I (19488) plat_update: Compressed image data_sz: 732369, uncompressed: 1103968
I (29028) esp_image: segment 0: paddr=00400020 vaddr=3c0d0020 size=2e2c4h (189124) map
I (29038) esp_image: segment 1: paddr=0042e2ec vaddr=3fc9ab00 size=01d2ch (  7468) 
I (29048) esp_image: segment 2: paddr=00430020 vaddr=42000020 size=c4264h (803428) map
I (29118) esp_image: segment 3: paddr=004f428c vaddr=3fc9c82c size=02ac8h ( 10952) 
I (29118) esp_image: segment 4: paddr=004f6d5c vaddr=40374000 size=16ae0h ( 92896) 
I (29128) esp_image: segment 0: paddr=00400020 vaddr=3c0d0020 size=2e2c4h (189124) map
I (29148) esp_image: segment 1: paddr=0042e2ec vaddr=3fc9ab00 size=01d2ch (  7468) 
I (29148) esp_image: segment 2: paddr=00430020 vaddr=42000020 size=c4264h (803428) map
I (29218) esp_image: segment 3: paddr=004f428c vaddr=3fc9c82c size=02ac8h ( 10952) 
I (29218) esp_image: segment 4: paddr=004f6d5c vaddr=40374000 size=16ae0h ( 92896) 
I (29268) app:acs:fw_updater: Found Firmware Updater partition type: 'FS Web'. Size: 487430 bytes
I (30998) plat_update: Found Update Header (version: 1). Encryption type: 0, compression type: 1
I (30998) plat_update: Compressed image data_sz: 487302, uncompressed: 499932
I (35398) plat_update: FS Web update finished successfully
I (41788) app:acs:fw_updater: Found 2 partitions. Version: 'none'
I (41788) app:acs:fw_updater: Found Firmware Updater partition type: 'Application'. Size: 732497 bytes
W (41788) plat_update: running_partition = 0x3c104000
W (41788) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (41788) esp_ota_ops: ota_app_count = 2
W (41788) esp_ota_ops: req_ota_slot = 0
W (41788) esp_ota_ops: otadata[0].ota_seq = 1
W (41788) esp_ota_ops: ota_slot = 0
W (41788) esp_ota_ops: otadata[0].crc = 0x4743989a
W (41788) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x4743989a
W (41788) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (41788) esp_ota_ops: ota_app_count = 2
W (41798) esp_ota_ops: req_ota_slot = 0
W (41798) esp_ota_ops: otadata[0].ota_seq = 1
W (41798) esp_ota_ops: ota_slot = 0
W (41798) esp_ota_ops: otadata[0].crc = 0x4743989a
W (41798) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x4743989a
I (44468) plat_update: Found Update Header (version: 1). Encryption type: 0, compression type: 1
I (44468) plat_update: Compressed image data_sz: 732369, uncompressed: 1103968
I (53828) esp_image: segment 0: paddr=00400020 vaddr=3c0d0020 size=2e2c4h (189124) map
I (53848) esp_image: segment 1: paddr=0042e2ec vaddr=3fc9ab00 size=01d2ch (  7468) 
I (53848) esp_image: segment 2: paddr=00430020 vaddr=42000020 size=c4264h (803428) map
I (53918) esp_image: segment 3: paddr=004f428c vaddr=3fc9c82c size=02ac8h ( 10952) 
I (53918) esp_image: segment 4: paddr=004f6d5c vaddr=40374000 size=16ae0h ( 92896) 
I (53928) esp_image: segment 0: paddr=00400020 vaddr=3c0d0020 size=2e2c4h (189124) map
I (53948) esp_image: segment 1: paddr=0042e2ec vaddr=3fc9ab00 size=01d2ch (  7468) 
I (53948) esp_image: segment 2: paddr=00430020 vaddr=42000020 size=c4264h (803428) map
I (54018) esp_image: segment 3: paddr=004f428c vaddr=3fc9c82c size=02ac8h ( 10952) 
I (54018) esp_image: segment 4: paddr=004f6d5c vaddr=40374000 size=16ae0h ( 92896) 
I (54078) app:acs:fw_updater: Found Firmware Updater partition type: 'FS Web'. Size: 487430 bytes
I (55818) plat_update: Found Update Header (version: 1). Encryption type: 0, compression type: 1
I (55818) plat_update: Compressed image data_sz: 487302, uncompressed: 499932
I (60158) plat_update: FS Web update finished successfully
I (61528) app:acs:fw_updater: Found 2 partitions. Version: 'none'
I (61528) app:acs:fw_updater: Found Firmware Updater partition type: 'Application'. Size: 732497 bytes
W (61528) plat_update: running_partition = 0x3c104000
W (61528) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (61528) esp_ota_ops: ota_app_count = 2
W (61528) esp_ota_ops: req_ota_slot = 0
W (61528) esp_ota_ops: otadata[0].ota_seq = 2
W (61528) esp_ota_ops: ota_slot = 1
W (61528) esp_ota_ops: otadata[0].crc = 0x55f63774
W (61528) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x55f63774
W (61528) esp_ota_ops: otadata[1].ota_seq = 2
W (61528) esp_ota_ops: ota_slot = 1
W (61528) esp_ota_ops: otadata[1].crc = 0x55f63774
W (61528) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[1]) = 0x55f63774
W (61528) esp_ota_ops: not_found = true
W (61528) plat_update: esp_ota_get_state_partition = ESP_ERR_NOT_FOUND
W (61528) esp_ota_ops: read_otadata(otadata) = 0x3c103fc8
W (61528) esp_ota_ops: ota_app_count = 2
W (61528) esp_ota_ops: req_ota_slot = 0
W (61528) esp_ota_ops: otadata[0].ota_seq = 2
W (61528) esp_ota_ops: ota_slot = 1
W (61528) esp_ota_ops: otadata[0].crc = 0x55f63774
W (61528) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[0]) = 0x55f63774
W (61528) esp_ota_ops: otadata[1].ota_seq = 2
W (61528) esp_ota_ops: ota_slot = 1
W (61528) esp_ota_ops: otadata[1].crc = 0x55f63774
W (61528) esp_ota_ops: bootloader_common_ota_select_crc(&otadata[1]) = 0x55f63774
W (61528) esp_ota_ops: not_found = true
I (64228) plat_update: Found Update Header (version: 1). Encryption type: 0, compression type: 1
I (64228) plat_update: Compressed image data_sz: 732369, uncompressed: 1103968

More Information.

No response

ermacv commented 2 days ago

I forgot to mention that I perform tests after this command:

idf.py otadata-flash app-flash monitor

So OTA Data in default state.

ermacv commented 2 days ago

I also added log to function esp_rewrite_ota_data:

    int active_otadata = bootloader_common_get_active_otadata(otadata);
    ESP_LOGW(TAG, "bootloader_common_get_active_otadata = %d", active_otadata);

and to rewrite_ota_seq:

static esp_err_t rewrite_ota_seq(esp_ota_select_entry_t *two_otadata, uint32_t seq, uint8_t sec_id, const esp_partition_t *ota_data_partition)
{
    if (two_otadata == NULL || sec_id > 1) {
        return ESP_ERR_INVALID_ARG;
    }

    ESP_LOGW(TAG, "sec_id = %u, seq = %lu, two_otadata[%u].crc = 0x%08lx, two_otadata[%u].ota_seq = 0x%08lx", sec_id, seq, sec_id, two_otadata[sec_id].crc, sec_id, two_otadata[sec_id].ota_seq);

And I see the following after performing 4 OTA updates:

W (85487) esp_ota_ops: bootloader_common_get_active_otadata = 0
W (85487) esp_ota_ops: sec_id = 1, seq = 2, two_otadata[1].crc = 0xffffffff, two_otadata[1].ota_seq = 0xffffffff
// OTA 1 is done

W (107297) esp_ota_ops: bootloader_common_get_active_otadata = 1
W (107297) esp_ota_ops: sec_id = 0, seq = 2, two_otadata[0].crc = 0x4743989a, two_otadata[0].ota_seq = 0x00000001
// OTA 2 is done

W (138287) esp_ota_ops: bootloader_common_get_active_otadata = 0
W (138287) esp_ota_ops: sec_id = 1, seq = 2, two_otadata[1].crc = 0x55f63774, two_otadata[1].ota_seq = 0x00000002
// OTA 3 is done

W (165457) esp_ota_ops: bootloader_common_get_active_otadata = 0
W (165457) esp_ota_ops: sec_id = 1, seq = 2, two_otadata[1].crc = 0x55f63774, two_otadata[1].ota_seq = 0x00000002
// OTA 4 is done

I guess the problem here that OTA Data for running OTA partition is rewritten with updated application. So the problem with second OTA update because sec_id is equal to running OTA Data. I guess it should not be rewritten.

KonstantinKondrashov commented 2 days ago

Hi @ermacv! I will try to check this case. but I can say that this workflow was not considered. We probably have to document it or catch it somehow to block it. Once esp_ota_set_boot_partition() is called you can not do one more ota update. A simple workaround for your case, I think is to do esp_ota_set_boot_partition() once, and not update the ota_data partition the following times:

esp_ota_begin
esp_ota_write
esp_ota_end
esp_ota_set_boot_partition
...
esp_ota_begin
esp_ota_write
esp_ota_end
...
esp_ota_begin
esp_ota_write
esp_ota_end

I guess that you use the same partition all the time.

ermacv commented 2 days ago

Hi @KonstantinKondrashov ! Thank you for the reply. Yes, the same partition is used every update procedure during the test.

Is it possible to erase the OTA data for not running partition?

KonstantinKondrashov commented 1 day ago

@ermacv

Yes, the same partition is used every update procedure during the test.

If it is the same partition then you have to perform esp_ota_set_boot_partition() only once. Because it increments a value in the ota_data to point to the updated (new) partition, which will be selected by the bootloader in the following boot. When you call it the second time, ota_data is updated and points to the current running app. After reboot, you will not get a new app running.

Yes, you can erase the ota_data partition (but only the whole partition), otherwise, if you erase it partially it will break the logic. If you erase the ota_data partition then the bootloader will select ota_0 if the image is valid, if not ota_1.

Have a look the doc https://github.com/espressif/esp-idf/blob/master/components/app_update/include/esp_ota_ops.h#L175-L189, it says that Configure OTA data for a new boot partition. It means that you have to call it when you are sure that the new app has to be used during the next boot.

ermacv commented 1 day ago

It means that you have to call it when you are sure that the new app has to be used during the next boot.

When I call esp_ota_set_boot_partition I'm sure that partition is updated and valid to be run on the next boot. But at the same time the board restart is not mandatory in my application logic (and it is no mandatory for IDF). So it is possible that user may try to update the board the second, third, ... time. So when esp_ota_set_boot_partition is called for second, third, ... time I'm also sure that the received image is valid and flashed correctly.

I mean I understand the problem root cause. But for me there are no actual HW problem to write the image multiple times to the same partition (or to a least updated partition if there are more than 2 OTA partitions in total). I think it is only a software issue that could be easily solved. And of course the logic must not touch running partition data.

Why can't the OTA logic set OTA Data of the updated partition to default state?

KonstantinKondrashov commented 1 day ago

Yes, I am going to update the logic in esp_ota_set_boot_partition() to have the correct behavior for this case, for sure. We do not need to set "OTA Data" to a default state, because bootloader will not be able to detect which parititon is new.

ermacv commented 1 day ago

I added the following check to my OTA Update start function:

  const esp_partition_t *running_partition = esp_ota_get_running_partition();
  assert(running_partition);
  const esp_partition_t *boot_partition = esp_ota_get_boot_partition();

  //! if running_partition != boot_partition then it means that partition was already written and OTA Data was updated
  //! so first we need to delete active ota data (it means last updated partition) and then start the OTA
  if ((boot_partition != NULL) && (running_partition != boot_partition)) {
    esp_err_t err = otadata_set_default_active();
    if (err != ESP_OK) {
      ESP_LOGW(TAG, "Can't set active partition to default state (%s)", esp_err_to_name(err));
    }
  }

Where otadata_set_default_active():

static esp_err_t otadata_set_default_active(void) {
  esp_ota_select_entry_t otadata[2];
  const esp_partition_t *otadata_partition = read_otadata(otadata);
  if (otadata_partition == NULL) {
    return ESP_ERR_NOT_FOUND;
  }

  int active_otadata = bootloader_common_get_active_otadata(otadata);
  if ((active_otadata != 0) && (active_otadata != 1)) {
    return ESP_ERR_INVALID_ARG;
  }

  return esp_partition_erase_range(otadata_partition, active_otadata * SPI_FLASH_SEC_SIZE, SPI_FLASH_SEC_SIZE);
}

And it seems to work fine. (Of course it's an application workaround and still IDF logic should updated)

ermacv commented 1 day ago

We do not need to set "OTA Data" to a default state, because bootloader will not be able to detect which parititon is new.

But in case if I want to write that partition once again it still won't recognized but bootloader if it was interrupted in the middle. I mean if user wants to write OTA partition once again the logic:

I think if user wants to write OTA image again without device reboot it should admit the fact that after starting the new OTA procedure the previously written image (and ota data) should become invalid even the update was interrupted or incomplete.