espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.38k stars 7.22k forks source link

Interrupt wdt timeout on CPU0 (IDF5, dual core, IRAM, uart_hal_iram, btdm_controller_task) (IDFGH-11181) #12348

Open VNovytskyi opened 11 months ago

VNovytskyi commented 11 months ago

Answers checklist.

IDF version.

ESP-IDF 5.0.4

Espressif SoC revision.

ESP32 rev3

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

CMD

Development Kit.

ESP32-WROVER-E

Power Supply used.

USB

What is the expected behavior?

I expected normal behaviour without the interrupt watchdog triggering.

What is the actual behavior?

In random time the interrupt watchdog timeout occurs. Needs a few minutes to catch the bug.

Steps to reproduce.

Use BLE, Wi-Fi, I2C and UART modules.

Debug Logs.

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0).

Core  0 register dump:
PC      : 0x4009b6f4  PS      : 0x00060935  A0      : 0x8008da43  A1      : 0x3ffc25e0
0x4009b6f4: i2c_hal_get_intsts_mask at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/i2c_hal_iram.c:67

A2      : 0x3ffc0c50  A3      : 0x3ffc2614  A4      : 0x00000000  A5      : 0x00060b23
A6      : 0x3ffeaeb0  A7      : 0x3ffeaefc  A8      : 0x00000008  A9      : 0x3ffc2520
A10     : 0x00000004  A11     : 0x00060b24  A12     : 0x00060b23  A13     : 0x00060b23
A14     : 0x00000000  A15     : 0x0000cdcd  SAR     : 0x0000001c  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x40094cd8  LEND    : 0x40094cf4  LCOUNT  : 0xffffffff
0x40094cd8: memcpy at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:175

0x40094cf4: memcpy at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:197

Core  0 was running in ISR context:
EPC1    : 0x4009ca4f  EPC2    : 0x4000bff0  EPC3    : 0x4009b6f2  EPC4    : 0x4009b6f2
0x4009ca4f: uart_hal_write_txfifo at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/uart_hal_iram.c:35

0x4009b6f2: i2c_hal_get_intsts_mask at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/i2c_hal_iram.c:66

0x4009b6f2: i2c_hal_get_intsts_mask at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/i2c_hal_iram.c:66

Backtrace: 0x4009b6f1:0x3ffc25e0 0x4008da40:0x3ffc2600 0x400833a5:0x3ffc2640 0x4008dadb:0x3ffebd30 0x4009a5f5:0x3ffebd40 0x401e61eb:0x3ffebd60 0x401436d1:0x3ffebda0 0x40152d0e:0x3ffebdc0
0x4009b6f1: i2c_ll_get_intsts_mask at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/esp32/include/hal/i2c_ll.h:243
 (inlined by) i2c_hal_get_intsts_mask at C:/Espressif/frameworks/esp-idf-v5.0.4/components/hal/i2c_hal_iram.c:66

0x4008da40: i2c_isr_handler_default at C:/Espressif/frameworks/esp-idf-v5.0.4/components/driver/i2c.c:515

0x400833a5: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v5.0.4/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1118

0x4008dadb: i2c_isr_handler_default at C:/Espressif/frameworks/esp-idf-v5.0.4/components/driver/i2c.c:578

0x4009a5f5: vPortClearInterruptMaskFromISR at C:/Espressif/frameworks/esp-idf-v5.0.4/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:566
 (inlined by) vPortExitCritical at C:/Espressif/frameworks/esp-idf-v5.0.4/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:337

0x401e61eb: xQueueReceive at C:/Espressif/frameworks/esp-idf-v5.0.4/components/freertos/FreeRTOS-Kernel/queue.c:1489

0x401436d1: queue_recv_hlevel_wrapper at C:/Espressif/frameworks/esp-idf-v5.0.4/components/bt/controller/esp32/bt.c:819

0x40152d0e: btdm_controller_task at ??:?

More Information.

esp-zhp commented 11 months ago

@VNovytskyi Thank you for reporting the issue. In commit https://github.com/espressif/esp-idf/commit/9a6a28734b88a2d29d2a0459010eff83e876e1da, we may have fixed the problem. Could you please check if the version you are using includes this commit? If it doesn't, would the issue still occur after applying this commit?

VNovytskyi commented 11 months ago

@zhp0406 Thank you for your reply! I found a possible fix for this problem: I changed the BLE controller core from CPU0 to CPU1. And it has been fixed. I am using ESP-IDF 5.0.4 and I haven't found the commit https://github.com/espressif/esp-idf/commit/9a6a28734b88a2d29d2a0459010eff83e876e1da on this branch. But I have found the commit https://github.com/espressif/esp-idf/commit/9a6a28734b88a2d29d2a0459010eff83e876e1da on the ESP-IDF 4.4.6:

Merge: 2c41b01771 9a6a28734b
commit 9a6a28734b88a2d29d2a0459010eff83e876e1da

I try to test my version of ESP-IDF with this commit and reply to you. If you have some more ideas please tell me.

esp-zhp commented 11 months ago

@VNovytskyi Hi, Have the issue been resolved?

timoxd7 commented 10 months ago

Thanks @VNovytskyi for getting me on the right track. I had the same issue and was able to fix it after reading your bug report. Maybe it helps you too what i found out.

You wrote "If you turn on the single core mode this issue doesn't arise.". With this in mind, i remembered that if you initialize something and handle its output later on another core, this can lead to issues. So, i put all my initialization logic on the same core where i also handle its outcome. As my I2C things run on Core 1 and its initialization was somewhere inside the app_main() which runs on Core 0, i got the exact same issue you have. After putting thie I2C initialization on Core 1 too, it worked fine. However, the problem only persitis if i turned on bluetooth, which also runs on Core 0. Now, after putting init and handling code for a component on one core, it worked, even if the different components (BLE, UART, I2C...) are on different cores.

In short words: Put your init code on the same core as your handling/looping code if it is bound to one component.

In detail, i think, this is due to initializing things that need special handlers or interrupts, you also initialize these interrupts to the core you are running on. If you now run your handling code on a different core while the interrupt rises on the former core, this might lead to problems.

VNovytskyi commented 10 months ago

Thanks @timoxd7! I will use your observations in my future designs! In addition, I can advise avoiding unpinned task creation. https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html#creation In my opinion, using _tskNOAFFINITY while creating the task is very dangerous. It means that the created task can be executed on both CPUs at different times. Maybe this behaviour produces self-blocking or something else. If you know additional information or if I am wrong - please provide the information you know!