espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.33k stars 7.2k forks source link

Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0) (IDFGH-398) #2530

Closed sjoteppagol closed 11 months ago

sjoteppagol commented 5 years ago

Environment

Problem Description

The device reboots with error message.

Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0) Register dump: PC : 0x400d2132 PS : 0x00060b34 A0 : 0x8008a298 A1 : 0x3ffd9f60 A2 : 0x00000008 A3 : 0x00000000 A4 : 0x00000001 A5 : 0x3ffda578
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x3ffc566c A9 : 0x3ffc5650 A10 : 0x00000000 A11 : 0x00000001 A12 : 0x43422c0c A13 : 0x00000001 A14 : 0x00000000 A15 : 0x3ffc05c0 SAR : 0x00000000 EXCCAUSE: 0x00000005 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000

Backtrace: 0x400d2132:0x3ffd9f60 0x4008a295:0x3ffd9f80

//Detailed problem description goes here. This type of guru meditation repeatedly occurred and back trace result doesn't seems to point out any application related code: Kinldy find the backtrace result below.

$ xtensa-esp32-elf-addr2line -pfiaC -e build/ecowater.elf 0x400d2132:0x3ffd9f60 0x4008a295:0x3ffd9f80 0x400d2132: esp_vApplicationIdleHook at C:/ESP_V3.0/msys32/home/sjoteppagol/esp-idf-v3.0/components/esp32/freertos_hooks.c:85 0x4008a295: prvIdleTask at C:/ESP_V3.0/msys32/home/sjoteppagol/esp-idf-v3.0/components/freertos/tasks.c:3529

Expected Behavior

The device should work normally.

Actual Behavior

The device is set up to run continuously but it is observed that same type of guru meditation happened 3-5 times in whole day.

Steps to repropduce

  1. Run system for 6 to 8 hours
  2. At some point system resets with message Guru Meditation Error: Core 0 panic'ed(Interrupt wdt timeout on CPU0)
  3. The occurrence of guru meditation is un-predictable but it is observed that if the system continuously running for over 6 to 8 hours at least one time it experienced Guru meditation reset.

// It helps if you attach a picture of your setup/wiring here.

Code to reproduce this issue

// If your code is longer than 30 lines, GIST is preferred.

Debug Logs

Debug log goes here, should contain the backtrace, as well as the reset source if it is a crash.
Please copy the plain text here for us to search the error log. Or attach the complete logs but leave the main part here if the log is *too* long.

Other items if possible

gerekon commented 5 years ago

@sjoteppagol Does the code you specified (empty app_main) re-produces the problem? I think this is just copy-pasted from problem's description template. It's hard to determine the root cause having such little amount of info. Now I can say that your problem can be caused by the code which accidentally disables interrupts and does not re-enable them till system watchdog fires. Could you provide your code for testing, sdkconfig, Elf file?

sjoteppagol commented 5 years ago

@gerekon The code specified (empty app_main) was part of problem description template and accidently left like that, sorry for that. I wont be able to share code, but I will try to reproduce this issue and share you both sdkconfig, elf file.

Alvin1Zhang commented 5 years ago

@sjoteppagol Could you help share if any updates for this issue? Thanks.

sjoteppagol commented 5 years ago

I could not able to re-produce this issue. Thanks

sjoteppagol commented 5 years ago

I am able to reproduce this issue, kept device running over night and in 12 hours device suffered 38 times from Guru Meditation: Core 0 panic'ed (Interrupt wdt timeout on CPU0). Could you please suggest a way to overcome this issue. Kindly find the attached sdkconfiglong, elf and Back trace result.

Backtrace_result.txt sdkconfig.txt Gurumediation_testing.zip

sjoteppagol commented 5 years ago

From the above log, I see one patter where the “ESP AWS Task Will Re-Init for New Connection” followed by “Shadow Connect” results in some program hung hence the WDT is getting triggered…. May be some programming sequence is not tolerated by some stack.!!

gerekon commented 5 years ago

@sjoteppagol Do you run your project on bare ESP32-Wrover-Kit or you are using additional HW connected to it?

If no extra HW needed: What actions should be performed to reproduce your problem? Is it enough just to flash your ELF file and leave the device working for 12 hours? Could you provide you partition table binary also?

Could you try with CONFIG_INT_WDT_CHECK_CPU1=0 in sdkconfig?

sjoteppagol commented 5 years ago

Hello @gerekon , Thank you for your response, it was helpful. As per your suggestion made changes in sdkconfig, not seen this issue till now. Could you please explain what this option actually do!? Whether it will impact my other functionality ?

gerekon commented 5 years ago

@sjoteppagol CONFIG_INT_WDT_CHECK_CPU1 controls checks for stalled OS on CPU1. It produces watchdog panic when OS does not tick on CPU1, e.g. when interrupts are disabled for a long period of time (by default 300 ms). Since it is not possible for us to reproduce your situation can I ask you to perform some additional verification? There is new feature added in IDF by 624828ce83adce851dea85e2e6e03868b9154aaa. It makes IDF printing core dumps for both CPUs upon WDT timeout. It can be cherry-picked cleanly onto your 94ec3c8e5. Can you pickup that changes and re-try your tests with enabled CONFIG_INT_WDT_CHECK_CPU1=1? It can help to detect if the problem in IDF or in your code.

Alvin1Zhang commented 4 years ago

@sjoteppagol Thanks for reporting, would you please help share if any updates for this issue? Thanks.

ion-girloanta commented 4 years ago

Got the same error when portENTER_CRITICAL_ISR without portEXIT_CRITICAL_ISR

gerekon commented 4 years ago

@ion-girloanta This is incorrect. Every portENTER_CRITICAL_ISR should be followed by portEXIT_CRITICAL_ISR. portENTER_CRITICAL_ISR disables interrupts and this leads to WDT triggering. This is intended behavior, otherwise w/o interrupts system can be non-functional and unresponsive.

enricop commented 4 years ago

2542

ole00 commented 4 years ago

I had the same issue. In my case the watchdog was triggered by poisoning the heap to Comprehensive level (I suspect the poisoning took too long) . When the level was lowered to Basic (no poisoning) or the timeout set via CONFIG_INT_WDT_CHECK_CPU1 (as suggested above by gerekon) was increased to 600 the issue disappeared.

copleston commented 4 years ago

Echoing what @ole00 said, I found that Comprehensive heap debugging was causing the errors described above.

SoucheSouche commented 11 months ago

Closing issue after 3 years of inactivity. If the problem still occurs, please refer to fatal errors documentation.