Closed MikyZ72 closed 2 years ago
Hello Espressif guys, no one respond here? We have a few thousand modules waiting to be sell and isn't possible that we must debug Espressif code! Problem is that esp_partition_write don't write in correct way the flash when is enabled the flash encryption, it broken bootloader code, partition table code, factory and ota in the flash.
Please help us...
What is the problem:
@MikyZ72
From eFuse summary, both secure boot and flash encryption schemes have been correctly configured. Following bootloader log confirms this.
I (347) secure_boot_v2: secure boot v2 is already enabled, continuing.. I (354) boot: Checking flash encryption... I (359) flash_encrypt: flash encryption is enabled (0 plaintext flashes left)
Moreover, bootloader is also successful in verifying firmware (its signature and hence integrity) and then it successfully hands over control to it.
Due to the ota is invalid my loader dont deep sleep and stay inside itself, wait for ota update via bluetooth with our protocol. At every power on i see that loader formatting every time the FAT, this is not correct.
I have few questions here:
fatfs
example, do you still run into same issue?CC @igrr
Good morning @mahavirj , many thanks for your reply...
After a hundred test we find that problem is with PSRAM enabled, encryption and in specific esp_partition_write stop work. In our project if we disable PSRAM all work fine.
We have also try Espressif example flash_encryption and with PSRAM disabled work fine, if we enable it, also the example stop to work. Please look example wrong output, the readed sequence of byte are not ugual to what it have just writed and the readed crypted byte are all 0xFF:
This is esp32s3 chip with 2 CPU core(s), WiFi/BLE, silicon revision 0, 16MB external flash
FLASH_CRYPT_CNT eFuse value is 1
Flash encryption feature is enabled in DEVELOPMENT mode
Erasing partition "storage" (0x1000 bytes)
Writing data with esp_partition_write:
I (483) example: 0x3fcf40e0 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
I (493) example: 0x3fcf40f0 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................|
Reading with esp_partition_read:
I (503) example: 0x3fcf40c0 c7 85 a3 a7 01 03 02 c7 33 74 7b 1f d3 c6 1a f8 |........3t{.....|
I (513) example: 0x3fcf40d0 97 0d 13 fa 8f 5e 0b 66 92 ae 0c a4 e9 fa bc bb |.....^.f........|
Reading with spi_flash_read:
I (523) example: 0x3fcf40c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
I (533) example: 0x3fcf40d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
I (553) nvs: NVS partition "nvs" is encrypted.```
@MikyZ72
We have also try Espressif example flash_encryption and with PSRAM disabled work fine, if we enable it, also the example stop to work.
Thanks for confirming! Can you please share your sdkconfig
in case you made any modifications to default configuration? Also you are trying this on ESP32-S3-WROOM-1-N16R8
(just to double confirm)?
Hi @mahavirj , here the sdkconfig of flash encryption example with PSRAM enabled
Yes i confirm that module is ESP32-S3-WROOM-1-N16R8
I can also confirm this issue, running on ESP32-S3-WROOM-1-N16R8.
Flash encryption is enabled (BLOCK_KEY0 has been set, KEY_PURPOSE_0 = XTS_AES_128_KEY, RD_DIS = 1, WR_DIS = 0x00800100, SPI_BOOT_CRYPT_CNT = 1). Secure boot has not been enabled.
Without PSRAM enabled in menuconfig, spi_flash_write_encrypted
works fine. After erase, write and read, the read value matches the value written.
After enabling PSRAM in menuconfig, setting it to octal mode and leaving all other PSRAM settings as-is, the spi_flash_write_encrypted
method now misbehaves by writing garbage to other locations in the flash.
For example, I tried to write 65536 bytes of data filled with 0x5a
to flash address 0x110000 using spi_flash_write_encrypted
where the corresponding block has been erased. If I immediately after read back the first bytes, I get 44 63 F7 FF
instead. After a reboot, the bootloader fails to boot due to "invalid header" (i.e. the first bytes of the flash are now corrupt).
I dumped the raw contents using esptool.py read_flash
to see what had happened. It appears it has written 64 byte garbage chunks to the start of every 16 384 byte blocks, i.e. bytes 0x0000 - 0x003f, 0x4000 - 0x403f, 0x8000 - 0x803f and so on until the end of the flash are all filled with random garbage. The rest of the flash is untouched. Even if I try to decrypt the contents using espsecure.py, the result still looks like garbage.
I did some more debugging.
First I tested the latest release/v5.0 branch to see if that would help. I had to migrate to the new esp_flash_
functions, but that did not change the end result; it is still broken.
I then found out that the driver writes the correct data, but instead of writing to addr
, it seems it writes to (addr << 8)
instead (modulo 16 MB). After copying 64 bytes of data from address 0x00C000 in my flash dump (see my previous post) to address 0x1100C0 in the flash dump, and then decrypting the whole flash dump using espsecure.py, the data at address 0x1100C0 is then correctly decrypted (i.e. 5A 5A 5A ...
). This confirms that the AES-XTS hardware encrypts for the correct address, but the SPI1 peripheral writes the result to the wrong address (addr << 8)
.
The reason the flash dump contains 64 byte "garbage" every 16 384 byte blocks is due to the spi_flash_chip_generic_write_encrypted
function splits up the data in 64 byte blocks and then writes them individually. 64 << 8 is 16 384, so that's where that offset comes from.
By using a debugger and setting a breakpoint at the beginning of spimem_flash_ll_user_start
, at this point dev->user1.usr_addr_bitlen
contains the correct value 24 - 1
. Also dev->addr
contains the correct value 0x00110000
(when writing to that address). I however noticed that dev->cache_fctrl.usr_cmd_4byte
contained 1, which has the header file documentation "Set this bit to enable SPI1 transfer with 32 bits address. The value of SPI_MEM_USR_ADDR_BITLEN
should be 31.". This is the cause of the issue which causes SPI1 to write to the incorrect flash address.
I saw that the octal PSRAM uses 32-bit address. After some investigation I found that the esp_rom_opiflash_exec_cmd
function sets this value to 1 without revertinig it. That function is called for SPI1 during PSRAM initialization. It seems PSRAM only uses SPI0 later on, so it should be safe to revert it after PSRAM initialization is complete. Flash Encryption write operation uses SPI1.
So for now I added this line:
SET_PERI_REG_BITS(SPI_MEM_CACHE_FCTRL_REG(1), SPI_MEM_CACHE_USR_CMD_4BYTE_V, 0, SPI_MEM_CACHE_USR_CMD_4BYTE_S);
to the function spi_flash_set_rom_required_regs
(outside the #if
block). This seems to solve the issue for the N16R8 module. I'm not sure if it's a correct solution for modules with larger flashes though.
Another alternative might be to set dev->cache_fctrl.usr_cmd_4byte = bitlen == 32 ? 1 : 0;
in spimem_flash_ll_set_usr_address
. But then it's done unnecessarily often maybe.
Hello Emill, thank for the big and complete information, but we are in production and we have exchange our code for deactivate PSRAM and dont use more this feature, now we are not able to wait some more fix or more time, we must sell the product (thousands). Now we sell product with code that don't respect our first specifications, and we have pay the module with PSRAM presence for nothing, plus we dont now if in the future (when there is the stable solution) we are able to re-enable PSRAM and update the product via ota without problem. This bug i think don't must be happen in Espressif code/company, mostly in the encryption parts, now i feel and i'm afraid that all code isn't tested in good way and maybe we have many problem with the product on our customer.
@MikyZ72
We have a few thousand modules waiting to be sell and isn't possible that we must debug Espressif code!
This bug i think don't must be happen in Espressif code/company, mostly in the encryption parts
I personally share your frustration and have often wondered if espressif intends to sell only to hobbyists. But on the other hand, there is a huge price difference between esp products and other companies. No other company offers an mcu with integrated wifi/bt/usb/psram for such a low price (which I find attractive, like probably yourself too). I try to remind myself about that everytime I grind my teeth wondering why something isn't working as I hoped. Moreover, the price of psram is merely a few cents depending on where/how one purchases.
More importantly, how often does a microsoft update end up doing more damage than help? Or a linux kernel upgrade that causes some driver to stop working? I used to rely heavily on PIC controllers in the past, specifically the Microchip Code Configurator. All of a sudden they bump the version, and it stopped calculating pwm frequencies correctly. Every SDK has bugs at any moment.
@Emill Hi, thanks for your fix, and your suggestion is great.
Take stability in to consideration, two parts to be modified. So, you can pick this commit https://github.com/espressif/esp-idf/commit/8538153616af1b651af990c69ebc1b4cb2835c53 and things will be ok.
Environment
Problem Description
Hello guys, If i flash all .bin system for production (ota, ota_data, factory, partition table, bootloader ecc) with efuses virtual enabled in partition emul_efuse, all work fine. If i disable virtual efuses for real burn efuses, start the problem, i think to see 2 problems:
esp_partition_get_sha256
on ota return code 0x2002esp_vfs_fat_spiflash_mount
format the FAT like it lose the last formatted procedure on power offI have made many test, sdkconfig and different flash procedures and i have briked 10 module, till now i dont have found solution!
Project
I have 2 project in Espressif IDE (Eclipse):
The 2 project have the same sdkconfig and the same custom partition.csv file (they are ugual and i keep ugual)
In both there are enabled Flash encryption AES-256, Secure Boot V2 with sign binary during build with my generated key (openssl), ROM download mode is enabled (insecure for now). Sign key is the same for the loader project and firmware project I dont encrypt NVS data
SDKconfig image encryption
Bootloader of loader project is customized for run all the time at power on Factory partition and for run all the time ota partition from deep sleep wakeup. Bootloader of firmware project it's the original from Espressif, isn't customized, but i dont use it. When bootloader run my loader app from factory partition, after some check i launch deep sleep for hundred ms (100) and at wake up bootloader run ota partition (i use only 2 partition for update: 1 factory and 1 ota, this for don't lose flash space of 3 partition like 1 factory and 2 ota as a espressif default). Partition table offset is set to "0xb000" for enlarge bootloader space (because crypt and info logs) as a documentation explain.
Partitions
Write to flash
After build, i flash all binary with manual command from prompt:
esptool.py -p COM7 --chip esp32s3 erase_flash
esptool.py -p COM7 --chip esp32s3 --before=default_reset --after=no_reset --no-stub write_flash --flash_mode dio --flash_freq 80m --flash_size 16MB 0x20000 loader.bin 0xb000 partition-table.bin 0x10000 ota_data_initial.bin 0x120000 firmware.bin
esptool.py -p COM7 --chip esp32s3 --before=default_reset --after=no_reset --no-stub write_flash --flash_mode dio --flash_freq 80m --flash_size 16MB 0x0 bootloader.bin
In all command i dont reset after all bin file are flashed and Bootloader as the last as documentation mention because at restart it will start to burn the efuses if all crypt process is fine. It will reset when i start monitor:
idf.py -p COM7 monitor
Expected Behavior
Firmware.bin file is signed at build and ota partition is crypted at first start of bootloader, i will expect that esp_partition_get_sha256 of ota is valid, but isn't. Storage partition isn't flashed and i think is empty (maybe is 0xFF) and i will expect at first start that is crypted and after formatted as a FAT when is mounted in automatic from API function, If i start to upload new ota firmware.bin with bluetooth and my protocol, file is saved in FAT and i run esp_reset(), then factory run check firmware.bin present in FAT and reflash ota partition with it.
Actual Behavior
Due to the ota is invalid my loader dont deep sleep and stay inside itself, wait for ota update via bluetooth with our protocol. At every power on i see that loader formatting every time the FAT, this is not correct. If i try to upload new ota firmware.bin with bluetooth and my protocol, function f_getfree on FAT error like the FAT is corrupted or not present
Debug Logs
Flash Log
Monitor Log
EFUSES LOG
Other items if possible