Using an IDE?: Yes - Visual Studio Code with official Espressif IDF extension
Power Supply: USB
Problem Description
When CONFIG_SPI_FLASH_SHARE_SPI1_BUS is enabled - whether on a WROOM module with only the main flash or on a WROVER module with main flash + PSRAM sharing SPI1 - I've observed a race condition where one of the threads waiting to access the main flash (for example, to read from NVS or to perform an OTA operation) can end up sleeping forever waiting to acquire the SPI1 bus lock. This persists even if no other threads are holding the SPI1 bus lock.
I've prepared a minimal test case that can be used to reproduce the issue on WROOM module on a DevKitC, though I originally encountered this issue on a custom PCBA that was using a WROVER module.
The following config options are changed from the defaults:
CONFIG_SPI_FLASH_SHARE_SPI1_BUS=y - obviously needed to reproduce this issue
CONFIG_FREERTOS_USE_TRACE_FACILITY=y and CONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=y - to allow for interactive task debugging from the serial port.
In the linked test case project, it spawns two threads that loop forever doing the following:
Read a random 1k chunk from the nvs partition (chosen because it exists by default)
Every 100 reads, output a log message to indicate the thread is still active.
Update a global table with the last time the thread completed a read.
Delay for a short, random amount of time
As an additional question, from reading the documentation on CONFIG_SPI_FLASH_SHARE_SPI1_BUS, I'm still uncertain if I even need to enable it for my application that is using a WROVER module with PSRAM. The PSRAM is a device that is shared on SPI1, but it's not an extra device that I'm accessing through the esp_flash APIs - do I need to enable CONFIG_SPI_FLASH_SHARE_SPI1_BUS for correctness even if I don't add an external flash device outside of the module?
Expected Behavior
The two threads should ideally both get chances to run and read from the main flash. It is possible for one thread to monopolize the CPU since they have equal priorities, but if that thread is terminated, then the other thread should have a chance to run and we should see that the other thread is able to read from flash successfully.
Actual Behavior
One thread is able to read from flash, while the other sleeps indefinitely. Using a shell command to signal the thread that is successfully reading to exit shows that the thread has exited, but the other thread continues to sleep in the blocked state, even though there is no-one to contest access to the SPI1 bus lock.
Halting the CPU with a debugger over JTAG shows that the blocked thread's stack trace is the following:
Thread #14 1073456156 (Name: test1) (Suspended : Container)
esp_crosscore_int_send() at crosscore_int.c:109 0x400814fa
esp_crosscore_int_send_yield() at crosscore_int.c:115 0x40081554
xQueueGenericReceive() at cpu_ll.h:39 0x40088480
dev_wait() at spi_bus_lock.c:681 0x40083ca2
spi_bus_lock_acquire_start() at spi_bus_lock.c:681 0x40083ca2
spi_start() at spi_flash_os_func_app.c:63 0x40083873
spi1_start() at spi_flash_os_func_app.c:80 0x40083888
spiflash_start() at esp_flash_api.c:77 0x40083e90
esp_flash_read() at esp_flash_api.c:544 0x4008376a
esp_partition_read() at partition.c:347 0x400d61e5
<...more frames...>
The implementation in spi_bus_lock.c looks quite complicated, but it seems that it is getting stuck sleeping on waiting for the semaphore to be given, even after the semaphore has already been given and is available.
Steps to reproduce
Build and flash the linked minimal test case to reproduce the issue
Over the serial port, observe that only one of either test1 or test2 is printing "Still alive" messages after the first ten seconds or so.
Input the status command via the serial port to confirm that the silent thread has not updated in a while
Input the spiflash_test stop <n> command to stop the active thread - either 1 for test1 or 2 for `test2.
Input the tasks command to confirm that the thread has exited
Wait a little while then input the status command to confirm that the other thread still has not updated.
Input the tasks command and note that the sleeping thread has still never run.
Disable CONFIG_SPI_FLASH_SHARE_SPI1_BUS in the sdkconfig
Environment
Problem Description
When
CONFIG_SPI_FLASH_SHARE_SPI1_BUS
is enabled - whether on a WROOM module with only the main flash or on a WROVER module with main flash + PSRAM sharing SPI1 - I've observed a race condition where one of the threads waiting to access the main flash (for example, to read from NVS or to perform an OTA operation) can end up sleeping forever waiting to acquire the SPI1 bus lock. This persists even if no other threads are holding the SPI1 bus lock.I've prepared a minimal test case that can be used to reproduce the issue on WROOM module on a DevKitC, though I originally encountered this issue on a custom PCBA that was using a WROVER module.
The following config options are changed from the defaults:
CONFIG_SPI_FLASH_SHARE_SPI1_BUS=y
- obviously needed to reproduce this issueCONFIG_FREERTOS_USE_TRACE_FACILITY=y
andCONFIG_FREERTOS_USE_STATS_FORMATTING_FUNCTIONS=y
- to allow for interactive task debugging from the serial port.In the linked test case project, it spawns two threads that loop forever doing the following:
As an additional question, from reading the documentation on
CONFIG_SPI_FLASH_SHARE_SPI1_BUS
, I'm still uncertain if I even need to enable it for my application that is using a WROVER module with PSRAM. The PSRAM is a device that is shared on SPI1, but it's not an extra device that I'm accessing through theesp_flash
APIs - do I need to enableCONFIG_SPI_FLASH_SHARE_SPI1_BUS
for correctness even if I don't add an external flash device outside of the module?Expected Behavior
The two threads should ideally both get chances to run and read from the main flash. It is possible for one thread to monopolize the CPU since they have equal priorities, but if that thread is terminated, then the other thread should have a chance to run and we should see that the other thread is able to read from flash successfully.
Actual Behavior
One thread is able to read from flash, while the other sleeps indefinitely. Using a shell command to signal the thread that is successfully reading to exit shows that the thread has exited, but the other thread continues to sleep in the blocked state, even though there is no-one to contest access to the SPI1 bus lock.
Halting the CPU with a debugger over JTAG shows that the blocked thread's stack trace is the following:
The implementation in
spi_bus_lock.c
looks quite complicated, but it seems that it is getting stuck sleeping on waiting for the semaphore to be given, even after the semaphore has already been given and is available.Steps to reproduce
status
command via the serial port to confirm that the silent thread has not updated in a whilespiflash_test stop <n>
command to stop the active thread - either1
fortest1
or2
for `test2.tasks
command to confirm that the thread has exitedstatus
command to confirm that the other thread still has not updated.tasks
command and note that the sleeping thread has still never run.CONFIG_SPI_FLASH_SHARE_SPI1_BUS
in the sdkconfigCode to reproduce this issue
See the gist up here: https://gist.github.com/devanlai/4429ca670538c91df570e7bae951e6ac
The important code is in
app_main.c
, but all of the support files are also included.Debug Logs
Sample serial log output (when manually entering commands to test the behavior):