gransee commented 2 years ago

This simple test program demonstrates a malfunction on ESP32-WROOM modules when there are UART reads on one core and SPI flash reads on the other core. Within seconds of running, both cores hang forever (with the WDT disabled). This can be seen when numbers stop printing out, and we also have LEDs on IO pins that stop flashing due to both cores hanging.

This malfunction has been verified to happen on both Rev 1 and 3 silicon as well as either IDF version 3.3.6 or 4.4.1. The project was built from the hello_world sample with the task watchdog setting disabled.

You can change how frequently the malfunction happens by changing task priorities, the CPU frequency, and the vTaskDelay counters or commenting them out (do this if you have trouble seeing the malfunction). The malfunction is very sensitive to timing and so a subtle timing change can make you think it started working okay. This problem was very hard to trace when it first happened infrequently in a much more complex program. So if you see "task starvation," understand that the test program was set up this way to make the malfunction happen more frequently for troubleshooting purposes.

Why does this malfunction happen, and what should be done to prevent it? Through trial-and-error we found that disabling interrupts during READ_PERI_REG seems to stop it, or using DPORT_READ_PERI_REG [esp_dport_access_reg_read()] instead. Just adding a "MEMW" instruction before each READ_PERI_REG does not help. We found that reading RTC_CNTL_TIME0_REG (also in the DPORT memory space) instead of the UART FIFO does not malfunction. There seems to be a mysterious, underlying problem we haven't resolved, only hidden?

Research suggests that READ_PERI_REG of the UART FIFO may be the culprit. The TRM warns that DPORT reads may cause programs to crash if the bug workarounds 3.3, 3.10, 3.16, and 3.17 are not carefully followed: https://www.espressif.com/sites/default/files/documentation/esp32_technical_reference_manual_en.pdf https://www.espressif.com/sites/default/files/documentation/eco_and_workarounds_for_bugs_in_esp32_en.pdf

In addition we have studied discussion of problems with UART FIFO reads like this one: https://github.com/espressif/esp-idf/issues/5101

A careful study of each above issue indicates they cannot explain this malfunction, that it must be caused by something not documented? Tracing through the driver code shows that spi_flash_read() calls spi_flash_guard_start() to halt both CPUs, but this gets stuck in esp_ipc_call_and_wait(). The other CPU gets stuck in ipc_task(), which calls spi_flash_op_block_func(). This roughly agrees with the debug output.

Perhaps there is some memory corruption from a collision b debug output and main.c program.zip etween READ_PERI_REG and the ipc interrupt? If a fix is required for this, what is it, and is it needed for reading something like RTC_CNTL_TIME0_REG also?

The following references "shared registers not protected from sharing": https://github.com/espressif/esp-idf/commit/8bba3485289c7d4053e742f9808cc1a6359e546a

Is some step required to protect the UART FIFO DPORT read from sharing also? None of the UART driver code seems to have anything like this.

debug output.txt

main.c

KonstantinKondrashov commented 2 years ago

Hi @gransee ! Thanks for the well description of this issue and the test example. I confirm that we see this issue on the master as well with esp32 eco0. Looks like it is related to 3.15 issue here: https://www.espressif.com/sites/default/files/documentation/eco_and_workarounds_for_bugs_in_esp32_en.pdf. We need some time to verify this. I will get back to you as soon as we do it. Thanks.

gransee commented 1 year ago

Any news?

gransee commented 1 year ago

We are getting ready to go to production with this part and this issue is continuing to cause core lock ups.

Dazza0 commented 1 year ago

@gransee I've been able to recreate the issue more concisely using the example below. The bug is reproducible both on v4.4.1 & master and on ESP32 ECO1 & ECO3 chips.

Run the example with the following configurations so that the INT WDT prints a backtrace when the CPUs hang. The code snippet basically has core 0 continuously disabling/enabling the cache, while core 1 is continuously reading the UART FIFO registers (though from the backtrace, it doesn't seem like the CPU is getting stuck on the UART read operation).

git checkout tags/v4.4.1
CONFIG_ESP_TASK_WDT=n
CONFIG_ESP_INT_WDT_TIMEOUT_MS=10000 to set to 10s INT WDT timeout. This should be sufficient time for the example to get stuck.

Code Snippet

```c #include "freertos/FreeRTOS.h" #include "freertos/task.h" #include "esp_attr.h" #include "soc/soc.h" #include "soc/uart_reg.h" #include "esp_rom_sys.h" void spi_flash_disable_interrupts_caches_and_other_cpu(void); void spi_flash_enable_interrupts_caches_and_other_cpu(void); // This task's code needs to be in IRAM as it needs to run while the cache is disabled static IRAM_ATTR void cahce_disable_task(void *arg) { esp_rom_printf("cache task up\n"); vTaskDelay(10); // Small delay to let other core start reading UART_FIFO_REG uint32_t loop_cnt = 0; while (1) { // Repeatedly enable and disable the cache for (int i = 0; i < 1000; i++) { spi_flash_disable_interrupts_caches_and_other_cpu(); spi_flash_enable_interrupts_caches_and_other_cpu(); } esp_rom_printf("loop %d\n", loop_cnt); loop_cnt++; } } static void uart_fifo_task(void *arg) { esp_rom_printf("uart task up\n"); while (1) { /* Repeatedly read the FIFO reg via the DPORT bus. This task will be preempted by the IPC task when the other core calls spi_flash_disable_interrupts_caches_and_other_cpu() */ READ_PERI_REG(UART_FIFO_REG(1)); } } void app_main(void) { vTaskDelay(10); // Small delay to ensure other CPU startup is complete xTaskCreatePinnedToCore(cahce_disable_task, "cache", 4096, NULL, 2, NULL, 0); xTaskCreatePinnedToCore(uart_fifo_task, "uart", 4096, NULL, 2, NULL, 1); } ```

When the CPUs hang/get stuck, the INT WDT kicks in 10 seconds later which generates the following backtrace.

Backtrace

``` I (305) cpu_start: Starting scheduler on PRO CPU. I (0) cpu_start: Starting scheduler on APP CPU. cache task up uart task up loop 0 loop 1 loop 2 Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0). Core 0 register dump: PC : 0x40081ff4 PS : 0x00060634 A0 : 0x8008349a A1 : 0x3ffb78c0 0x40081ff4: esp_dport_access_reg_read at /home/user/esp/esp-idf/components/esp_hw_support/port/esp32/dport_access.c:50 A2 : 0x000000e6 A3 : 0x13000248 A4 : 0x1300024c A5 : 0x00000000 A6 : 0x3ffb2a30 A7 : 0x3ffb2a48 A8 : 0x15122500 A9 : 0x00060620 A10 : 0x00000003 A11 : 0x00060623 A12 : 0x00060620 A13 : 0x00060623 A14 : 0x00000000 A15 : 0x00000001 SAR : 0x00000004 EXCCAUSE: 0x00000005 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000 Backtrace:0x40081ff1:0x3ffb78c00x40083497:0x3ffb78e0 0x400835e4:0x3ffb7900 0x40082b71:0x3ffb7920 0x40087955:0x3ffb7940 0x40081ff1: esp_dport_access_reg_read at /home/user/esp/esp-idf/components/esp_hw_support/port/esp32/dport_access.c:50 0x40083497: DPORT_READ_PERI_REG at /home/user/esp/esp-idf/components/soc/esp32/include/soc/dport_access.h:172 (inlined by) spi_flash_disable_cache at /home/user/esp/esp-idf/components/spi_flash/cache_utils.c:307 0x400835e4: spi_flash_disable_interrupts_caches_and_other_cpu at /home/user/esp/esp-idf/components/spi_flash/cache_utils.c:183 0x40082b71: cahce_disable_task at /home/user/esp_projects/idf-project-cmake/build/../main/main.c:545 (discriminator 3) 0x40087955: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:131 Core 1 register dump: PC : 0x40083461 PS : 0x00060834 A0 : 0x80082c11 A1 : 0x3ffb3780 0x40083461: spi_flash_op_block_func at /home/user/esp/esp-idf/components/spi_flash/cache_utils.c:123 (discriminator 1) A2 : 0x00000001 A3 : 0x00000001 A4 : 0x3ffafd18 A5 : 0x00000000 A6 : 0x00000000 A7 : 0x00000001 A8 : 0x00000000 A9 : 0x00000001 A10 : 0x3ffb2548 A11 : 0x00000000 A12 : 0x13000044 A13 : 0x3ffb79d4 A14 : 0x00000001 A15 : 0x3ff4001c SAR : 0x00000004 EXCCAUSE: 0x00000005 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000 Backtrace:0x4008345e:0x3ffb37800x40082c0e:0x3ffb37a0 0x40087955:0x3ffb37c0 0x4008345e: spi_flash_op_block_func at /home/user/esp/esp-idf/components/spi_flash/cache_utils.c:122 0x40082c0e: ipc_task at /home/user/esp/esp-idf/components/esp_ipc/src/esp_ipc.c:74 0x40087955: vPortTaskWrapper at /home/user/esp/esp-idf/components/freertos/port/xtensa/port.c:131 ```

After trying multiple iterations (with different WDT timeouts), it seems like:

core 1 doesn't seem to be stuck. It's simply looping in spi_flash_op_block_func waiting for core 0 to reenable the cache (i.e., waiting for s_flash_op_complete. Thus its crash PC varies a bit depending on where it was in the loop when the INT WDT timeout occurs.
core 0 is getting stuck inspi_flash_disable_cache when attempting to read DPORT_PRO_CACHE_CTRL1_REG. More specifically, it is consistently getting stuck on the rsync instruction of esp_dport_access_reg_read short after the l32i.n instruction on DPORT_PRO_CACHE_CTRL1_REG.

From my tests, this issue seems to be specifically related to DPORT peripheral FIFO register, as repeatedly reading any one of the following DPORT FIFO registers can reproduce the issue:

        READ_PERI_REG(UART_FIFO_REG(0));
        READ_PERI_REG(UART_FIFO_REG(1));
        READ_PERI_REG(UART_FIFO_REG(2));
        READ_PERI_REG(0x3FF4F004);  // I2S0_FIFO_RD_REG
        READ_PERI_REG(0x3FF6D004);  // I2S1_FIFO_RD_REG

My guess is that this issue might be related to errata 3.3 (i.e., repeated DPORT access to the same peripheral FIFO registers). Unfortunately, I haven't been able to recreate the issue with a more simple case (i.e., exact series of read/writes to particular buses to cause the issue).

The strange thing is that when the CPU hangs, core 1 is not actually calling READ_PERI_REG(UART_FIFO_REG(1)); at that time, as the spi_flash_disable_interrupts_caches_and_other_cpu call from core 0 will cause core 1 to loop in the IPC task. Though it might be possible that the repeated reads on a DPORT FIFO register leaves the DPORT bus in a stuck state, thus the next CPU to access the same bus (i.e., core 0) ends up being stuck.

We are getting ready to go to production with this part and this issue is continuing to cause core lock ups.

For now, I suggest you use the DPORT_READ_PERI_REG() to read any DPORT FIFO registers. DPORT_READ_PERI_REG() basically inserts a filler DPORT read access (reads the UART date register) so that we don't get consecutive reads to the same register.

I'll further check with the hardware team (maybe the HW fix for errata 3.3 introduced in ECO1 lead to another HW regression?).

AxelLin commented 1 year ago

@gransee I've been able to recreate the issue more concisely using the example below. The bug is reproducible both on v4.4.1 & master and on ESP32 ECO1 & ECO3 chips.

@Dazza0 Any update?

gransee commented 1 year ago

We don't think the problem is caused by repeated reads of the FIFO as suggested, because our malfunctioning test programs have loops that always read the FIFO count before each read of the FIFO itself. (So this provides the alternate DPORT read before each FIFO read, eliminating the need for an extra bogus DPORT read.) They also use more than the minimum instructions/cycles between successive FIFO reads given for errata 3.16.

We wondered if the problem was related to 3.10 instead of 3.3, because the UARTs are in the "address space B" described, and the solution we coincidentally found (that disables interrupts) seems similar to the second workaround suggested. However there are differences and factors that make 3.10 not seem to fit either. Perhaps it is related to 3.15 as you originally suggested?

We did further testing and found that if we read the UART FIFO via DPORT_READ_PERI_REG instead of READ_PERI_REG as suggested, the lockup malfunction is prevented. However we had previously found that simply disabling interrupts for READ_PERI_REG seemed to work just as well, though we don't know why. (Perhaps this fact can help trace the cause?) We measured the time required to use each method in this code snippet running at 240MHz:

` // Macros for efficient UART operation (pass constants, not variables):

define UART_wait(uart) while (READ_PERI_REG(UART_STATUS_REG (uart)) \

& (UART_ST_UTX_OUT_M | UART_TXFIFO_CNT_M))

define UART_write(uart,code) WRITE_PERI_REG (UART_FIFO_AHB_REG (uart), code)

define UART_read(uart) DPORT_READ_PERI_REG (UART_FIFO_REG (uart))

define UART_ready(uart) (uart ? (uart == 2 \

? (UART2.mem_rx_status.wr_addr != UART2.mem_rx_status.rd_addr)          \
: (UART1.mem_rx_status.wr_addr != UART1.mem_rx_status.rd_addr))         \
: (UART0.mem_rx_status.wr_addr != UART0.mem_rx_status.rd_addr))

uint t1 = REG_READ (FRC_TIMER_COUNT_REG (1)); dcode = (uindex < 4 ? 0x89 : (uindex >> 2) + 0x8B) | (uindex & 3) << 5;

// Read previous results buffered in UARTs uint r [4] = {0, 0, 0, 0}, c [3] = {0, 0, 0}; // uint volatile register itr = XTOS_DISABLE_ALL_INTERRUPTS; // prevent lockup on flash access for (; UART_ready (1); ++c [0]) r [0] = r [0] << 8 | UART_read (1); for (; UART_ready (2); ++c [1]) r [1] = r [1] << 8 | UART_read (2); for (; UART_ready (0); ++c [2]) { if (c [2] == 4) // (can store 5-8 bytes) r [3] = r [2]; r [2] = r [2] << 8 | UART_read (0); } // XTOS_RESTORE_INTLEVEL (itr); // prevent lockup on flash access UART_write (1, dcode); UART_write (2, dcode); if (c [2] > 4) // fix extra shift from odd read r [3] >>= (8 - c [2]) << 3; uint stop = 0; uint t2 = REG_READ (FRC_TIMER_COUNT_REG (1)); if (uindex == 8) lprintf ("%d %d\n", t1 / 80000000, t2 - t1); `

The DPORT_READ_PERI_REG method took 5.53 usec per cycle whereas the interrupt disable method took only 3.86 usec per cycle (worst case). We compared the compiled assembly of the code snippet using the two methods and marked up the differences in the attached spreadsheet. (Yellow instructions are differences between the two methods, and green instructions match but are just in a different order.)

assembly comparison.xls

The relative slowness of the recommended solution is a problem for us because we are running a time-critical loop (on core1). The reason for slowness may be redundant enabling/disabling of interrupts (since we read all 3 UARTs at once), calling a separate function to read the FIFO, and an extra, needless DPORT read.

Other tests we tried: • Using _DPORT_REG_READ instead of READ_PERI_REG gives the exact same results and compiled assembly. • Using READ_PERI_REG without disabling interrupts takes 3.79 usec per cycle and malfunctions. • Using READ_PERI_REG with DPORT_INTERRUPT_DISABLE()/DPORT_INTERRUPT_RESTORE() (instead of disabling all interrupts) takes 3.93 usec per cycle. • Using inline coding of DPORT_READ_PERI_REG (instead of using the IRAM function call) takes 4.93 usec per cycle. • Replacing the UART_ready comparison of indices with just UART#.status.rxfifo_cnt decreased the cycle time from 3.86 usec to only 3.13 usec. Therefore disabling/enabling all interrupts adds only 70 nsec per cycle, whereas using DPORT_INTERRUPT_DISABLE/RESTORE adds 140 nsec per cycle. (Not sure why disabling and restoring all interrupts is faster than the selective method, although maybe the one that is faster on core1 slows down core0 more.) Because the true cause of this malfunction remains ellusive, disabling all interrupts seems safer than the slower, selective disabling. It may help prevent problems described in errata 3.17 for example.

In conclusion, it seems our best bet is to use the fastest method that works in testing, unless we have some reason to believe that it may be prone to problems we haven't seen yet. This means disabling all interrupts before reading all UARTs together, and reading each FIFO count before reading each FIFO. However we are not clear on whether we can just read the FIFO count instead of reading the read/write indices and comparing them to see the count. The former method is faster but we found it to malfunction as described here: https://www.esp32.com/viewtopic.php?t=15476, and 3.17 seems to say to not rely on the counter at all. However disabling interrupts before FIFO reads seems to prevent 3.17 and the malfunction that we had in practice (which we can't seem to recreate).

Can UART#.status.rxfifo_cnt be trusted any time, only when interrupts are disabled, or not at all? It seems like errata 3.17 should be expanded/clarified to explain exactly what workaround is needed and when, because it is pretty vague now.

AxelLin commented 1 year ago

@KonstantinKondrashov @Dazza0 Any comment about https://github.com/espressif/esp-idf/issues/9171#issuecomment-1605055136 ?

espressif / esp-idf

UART/DPORT reads with SPI flash access hangs both cores (IDFGH-7618) #9171

define UART_wait(uart) while (READ_PERI_REG(UART_STATUS_REG (uart)) \

define UART_write(uart,code) WRITE_PERI_REG (UART_FIFO_AHB_REG (uart), code)

define UART_read(uart) DPORT_READ_PERI_REG (UART_FIFO_REG (uart))

define UART_ready(uart) (uart ? (uart == 2 \