Open gransee opened 2 years ago
Hi @gransee ! Thanks for the well description of this issue and the test example. I confirm that we see this issue on the master as well with esp32 eco0. Looks like it is related to 3.15 issue here: https://www.espressif.com/sites/default/files/documentation/eco_and_workarounds_for_bugs_in_esp32_en.pdf. We need some time to verify this. I will get back to you as soon as we do it. Thanks.
Any news?
We are getting ready to go to production with this part and this issue is continuing to cause core lock ups.
@gransee I've been able to recreate the issue more concisely using the example below. The bug is reproducible both on v4.4.1 & master and on ESP32 ECO1 & ECO3 chips.
Run the example with the following configurations so that the INT WDT prints a backtrace when the CPUs hang. The code snippet basically has core 0 continuously disabling/enabling the cache, while core 1 is continuously reading the UART FIFO registers (though from the backtrace, it doesn't seem like the CPU is getting stuck on the UART read operation).
git checkout tags/v4.4.1
CONFIG_ESP_TASK_WDT=n
CONFIG_ESP_INT_WDT_TIMEOUT_MS=10000
to set to 10s INT WDT timeout. This should be sufficient time for the example to get stuck.When the CPUs hang/get stuck, the INT WDT kicks in 10 seconds later which generates the following backtrace.
After trying multiple iterations (with different WDT timeouts), it seems like:
spi_flash_op_block_func
waiting for core 0 to reenable the cache (i.e., waiting for s_flash_op_complete
. Thus its crash PC varies a bit depending on where it was in the loop when the INT WDT timeout occurs.spi_flash_disable_cache
when attempting to read DPORT_PRO_CACHE_CTRL1_REG
. More specifically, it is consistently getting stuck on the rsync
instruction of esp_dport_access_reg_read
short after the l32i.n
instruction on DPORT_PRO_CACHE_CTRL1_REG
.From my tests, this issue seems to be specifically related to DPORT peripheral FIFO register, as repeatedly reading any one of the following DPORT FIFO registers can reproduce the issue:
READ_PERI_REG(UART_FIFO_REG(0));
READ_PERI_REG(UART_FIFO_REG(1));
READ_PERI_REG(UART_FIFO_REG(2));
READ_PERI_REG(0x3FF4F004); // I2S0_FIFO_RD_REG
READ_PERI_REG(0x3FF6D004); // I2S1_FIFO_RD_REG
My guess is that this issue might be related to errata 3.3 (i.e., repeated DPORT access to the same peripheral FIFO registers). Unfortunately, I haven't been able to recreate the issue with a more simple case (i.e., exact series of read/writes to particular buses to cause the issue).
The strange thing is that when the CPU hangs, core 1 is not actually calling READ_PERI_REG(UART_FIFO_REG(1));
at that time, as the spi_flash_disable_interrupts_caches_and_other_cpu
call from core 0 will cause core 1 to loop in the IPC task. Though it might be possible that the repeated reads on a DPORT FIFO register leaves the DPORT bus in a stuck state, thus the next CPU to access the same bus (i.e., core 0) ends up being stuck.
We are getting ready to go to production with this part and this issue is continuing to cause core lock ups.
For now, I suggest you use the DPORT_READ_PERI_REG()
to read any DPORT FIFO registers. DPORT_READ_PERI_REG()
basically inserts a filler DPORT read access (reads the UART date register) so that we don't get consecutive reads to the same register.
I'll further check with the hardware team (maybe the HW fix for errata 3.3 introduced in ECO1 lead to another HW regression?).
@gransee I've been able to recreate the issue more concisely using the example below. The bug is reproducible both on v4.4.1 & master and on ESP32 ECO1 & ECO3 chips.
@Dazza0 Any update?
We don't think the problem is caused by repeated reads of the FIFO as suggested, because our malfunctioning test programs have loops that always read the FIFO count before each read of the FIFO itself. (So this provides the alternate DPORT read before each FIFO read, eliminating the need for an extra bogus DPORT read.) They also use more than the minimum instructions/cycles between successive FIFO reads given for errata 3.16.
We wondered if the problem was related to 3.10 instead of 3.3, because the UARTs are in the "address space B" described, and the solution we coincidentally found (that disables interrupts) seems similar to the second workaround suggested. However there are differences and factors that make 3.10 not seem to fit either. Perhaps it is related to 3.15 as you originally suggested?
We did further testing and found that if we read the UART FIFO via DPORT_READ_PERI_REG instead of READ_PERI_REG as suggested, the lockup malfunction is prevented. However we had previously found that simply disabling interrupts for READ_PERI_REG seemed to work just as well, though we don't know why. (Perhaps this fact can help trace the cause?) We measured the time required to use each method in this code snippet running at 240MHz:
` // Macros for efficient UART operation (pass constants, not variables):
& (UART_ST_UTX_OUT_M | UART_TXFIFO_CNT_M))
? (UART2.mem_rx_status.wr_addr != UART2.mem_rx_status.rd_addr) \
: (UART1.mem_rx_status.wr_addr != UART1.mem_rx_status.rd_addr)) \
: (UART0.mem_rx_status.wr_addr != UART0.mem_rx_status.rd_addr))
uint t1 = REG_READ (FRC_TIMER_COUNT_REG (1)); dcode = (uindex < 4 ? 0x89 : (uindex >> 2) + 0x8B) | (uindex & 3) << 5;
// Read previous results buffered in UARTs uint r [4] = {0, 0, 0, 0}, c [3] = {0, 0, 0}; // uint volatile register itr = XTOS_DISABLE_ALL_INTERRUPTS; // prevent lockup on flash access for (; UART_ready (1); ++c [0]) r [0] = r [0] << 8 | UART_read (1); for (; UART_ready (2); ++c [1]) r [1] = r [1] << 8 | UART_read (2); for (; UART_ready (0); ++c [2]) { if (c [2] == 4) // (can store 5-8 bytes) r [3] = r [2]; r [2] = r [2] << 8 | UART_read (0); } // XTOS_RESTORE_INTLEVEL (itr); // prevent lockup on flash access UART_write (1, dcode); UART_write (2, dcode); if (c [2] > 4) // fix extra shift from odd read r [3] >>= (8 - c [2]) << 3; uint stop = 0; uint t2 = REG_READ (FRC_TIMER_COUNT_REG (1)); if (uindex == 8) lprintf ("%d %d\n", t1 / 80000000, t2 - t1); `
The DPORT_READ_PERI_REG method took 5.53 usec per cycle whereas the interrupt disable method took only 3.86 usec per cycle (worst case). We compared the compiled assembly of the code snippet using the two methods and marked up the differences in the attached spreadsheet. (Yellow instructions are differences between the two methods, and green instructions match but are just in a different order.)
The relative slowness of the recommended solution is a problem for us because we are running a time-critical loop (on core1). The reason for slowness may be redundant enabling/disabling of interrupts (since we read all 3 UARTs at once), calling a separate function to read the FIFO, and an extra, needless DPORT read.
Other tests we tried: • Using _DPORT_REG_READ instead of READ_PERI_REG gives the exact same results and compiled assembly. • Using READ_PERI_REG without disabling interrupts takes 3.79 usec per cycle and malfunctions. • Using READ_PERI_REG with DPORT_INTERRUPT_DISABLE()/DPORT_INTERRUPT_RESTORE() (instead of disabling all interrupts) takes 3.93 usec per cycle. • Using inline coding of DPORT_READ_PERI_REG (instead of using the IRAM function call) takes 4.93 usec per cycle. • Replacing the UART_ready comparison of indices with just UART#.status.rxfifo_cnt decreased the cycle time from 3.86 usec to only 3.13 usec. Therefore disabling/enabling all interrupts adds only 70 nsec per cycle, whereas using DPORT_INTERRUPT_DISABLE/RESTORE adds 140 nsec per cycle. (Not sure why disabling and restoring all interrupts is faster than the selective method, although maybe the one that is faster on core1 slows down core0 more.) Because the true cause of this malfunction remains ellusive, disabling all interrupts seems safer than the slower, selective disabling. It may help prevent problems described in errata 3.17 for example.
In conclusion, it seems our best bet is to use the fastest method that works in testing, unless we have some reason to believe that it may be prone to problems we haven't seen yet. This means disabling all interrupts before reading all UARTs together, and reading each FIFO count before reading each FIFO. However we are not clear on whether we can just read the FIFO count instead of reading the read/write indices and comparing them to see the count. The former method is faster but we found it to malfunction as described here: https://www.esp32.com/viewtopic.php?t=15476, and 3.17 seems to say to not rely on the counter at all. However disabling interrupts before FIFO reads seems to prevent 3.17 and the malfunction that we had in practice (which we can't seem to recreate).
Can UART#.status.rxfifo_cnt be trusted any time, only when interrupts are disabled, or not at all? It seems like errata 3.17 should be expanded/clarified to explain exactly what workaround is needed and when, because it is pretty vague now.
@KonstantinKondrashov @Dazza0 Any comment about https://github.com/espressif/esp-idf/issues/9171#issuecomment-1605055136 ?
This simple test program demonstrates a malfunction on ESP32-WROOM modules when there are UART reads on one core and SPI flash reads on the other core. Within seconds of running, both cores hang forever (with the WDT disabled). This can be seen when numbers stop printing out, and we also have LEDs on IO pins that stop flashing due to both cores hanging.
This malfunction has been verified to happen on both Rev 1 and 3 silicon as well as either IDF version 3.3.6 or 4.4.1. The project was built from the hello_world sample with the task watchdog setting disabled.
You can change how frequently the malfunction happens by changing task priorities, the CPU frequency, and the vTaskDelay counters or commenting them out (do this if you have trouble seeing the malfunction). The malfunction is very sensitive to timing and so a subtle timing change can make you think it started working okay. This problem was very hard to trace when it first happened infrequently in a much more complex program. So if you see "task starvation," understand that the test program was set up this way to make the malfunction happen more frequently for troubleshooting purposes.
Why does this malfunction happen, and what should be done to prevent it? Through trial-and-error we found that disabling interrupts during READ_PERI_REG seems to stop it, or using DPORT_READ_PERI_REG [esp_dport_access_reg_read()] instead. Just adding a "MEMW" instruction before each READ_PERI_REG does not help. We found that reading RTC_CNTL_TIME0_REG (also in the DPORT memory space) instead of the UART FIFO does not malfunction. There seems to be a mysterious, underlying problem we haven't resolved, only hidden?
Research suggests that READ_PERI_REG of the UART FIFO may be the culprit. The TRM warns that DPORT reads may cause programs to crash if the bug workarounds 3.3, 3.10, 3.16, and 3.17 are not carefully followed: https://www.espressif.com/sites/default/files/documentation/esp32_technical_reference_manual_en.pdf https://www.espressif.com/sites/default/files/documentation/eco_and_workarounds_for_bugs_in_esp32_en.pdf
In addition we have studied discussion of problems with UART FIFO reads like this one: https://github.com/espressif/esp-idf/issues/5101
A careful study of each above issue indicates they cannot explain this malfunction, that it must be caused by something not documented? Tracing through the driver code shows that spi_flash_read() calls spi_flash_guard_start() to halt both CPUs, but this gets stuck in esp_ipc_call_and_wait(). The other CPU gets stuck in ipc_task(), which calls spi_flash_op_block_func(). This roughly agrees with the debug output.
Perhaps there is some memory corruption from a collision b debug output and main.c program.zip etween READ_PERI_REG and the ipc interrupt? If a fix is required for this, what is it, and is it needed for reading something like RTC_CNTL_TIME0_REG also?
The following references "shared registers not protected from sharing": https://github.com/espressif/esp-idf/commit/8bba3485289c7d4053e742f9808cc1a6359e546a
Is some step required to protect the UART FIFO DPORT read from sharing also? None of the UART driver code seems to have anything like this.
debug output.txt
main.c