espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.7k stars 7.3k forks source link

Runtime Core 0 panic'ed (Double exception) before PRO CPU user code (IDFGH-10243) #11506

Closed rolupusoru closed 8 months ago

rolupusoru commented 1 year ago

Answers checklist.

IDF version.

framework-espidf @ 3.50002.230512 (5.0.2)

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Lilygo T-Display-S3 (ESP32-S3R8)

Power Supply used.

USB

What is the expected behavior?

Main task expected to execute

What is the actual behavior?

Double exception thrown in RTOS before main task is started. Check output logs for list of events.

Steps to reproduce.

Accidentally came across this problem as the application code continued to grow in size. Only task running on CPU 0 seems to have this issue.

If I comment out random lines of code on CPU 0 task the error goes away. If i enable extra System Features features such a ( e.g. TRAX) problem disappears.

Debug Logs.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x3 (RTC_SW_SYS_RST),boot:0x2a (SPI_FAST_FLASH_BOOT)
Saved PC:0x40374280
pro cpu reset by JTAG
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x18d8
load:0x403c9700,len:0xed4
load:0x403cc700,len:0x36c8
entry 0x403c99cc
I (26) boot: ESP-IDF 5.0.1 2nd stage bootloader
I (26) boot: compile time 22:24:11
I (26) boot: chip revision: v0.1
I (28) qio_mode: Enabling QIO for flash chip WinBond
I (34) boot.esp32s3: Boot SPI Speed : 80MHz
I (38) boot.esp32s3: SPI Mode       : QIO
I (43) boot.esp32s3: SPI Flash Size : 16MB
I (45) boot: Enabling RNG early entropy source...
I (46) boot: Partition Table:
I (46) boot: ## Label            Usage          Type ST Offset   Length
I (53) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (60) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (68) boot:  2 factory          factory app      00 00 00010000 00400000
I (76) boot: End of partition table
I (80) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=18470h ( 99440) map
I (104) esp_image: segment 1: paddr=00028498 vaddr=3fc95100 size=03a48h ( 14920) load
I (108) esp_image: segment 2: paddr=0002bee8 vaddr=40374000 size=04130h ( 16688) load
I (114) esp_image: segment 3: paddr=00030020 vaddr=42000020 size=50004h (327684) map
I (170) esp_image: segment 4: paddr=0008002c vaddr=40378130 size=0cf10h ( 53008) load
I (190) boot: Loaded app from partition at offset 0x10000
I (191) boot: Disabling RNG early entropy source...
I (202) octal_psram: vendor id    : 0x0d (AP)
I (202) octal_psram: dev id       : 0x02 (generation 3)
I (203) octal_psram: density      : 0x03 (64 Mbit)
I (207) octal_psram: good-die     : 0x01 (Pass)
I (213) octal_psram: Latency      : 0x01 (Fixed)
I (218) octal_psram: VCC          : 0x01 (3V)
I (223) octal_psram: SRF          : 0x01 (Fast Refresh)
I (229) octal_psram: BurstType    : 0x01 (Hybrid Wrap)
I (235) octal_psram: BurstLen     : 0x01 (32 Byte)
I (240) octal_psram: Readlatency  : 0x02 (10 cycles@Fixed)
I (246) octal_psram: DriveStrength: 0x00 (1/1)
I (252) esp_psram: Found 8MB PSRAM device
I (256) esp_psram: Speed: 80MHz
I (260) cpu_start: Pro cpu up.
I (264) cpu_start: Starting app cpu, entry point is 0x403762b8
I (0) cpu_start: App cpu up.
I (716) esp_psram: SPI SRAM memory test OK
Guru Meditation Error: Core  0 panic'ed (Double exception). 

Core  0 register dump:
PC      : 0x4037ca7a  PS      : 0x00040136  A0      : 0x00000000  A1      : 0x3fceb130
A2      : 0x00040136  A3      : 0x00040026  A4      : 0x00000000  A5      : 0x00000000
A6      : 0x00000000  A7      : 0xffffffff  A8      : 0x3fceb1f0  A9      : 0x00000001
A10     : 0x0000006d  A11     : 0x00000001  A12     : 0x3fceb1d0  A13     : 0x0000006d
A14     : 0x00000000  A15     : 0x00000017  SAR     : 0x00000020  EXCCAUSE: 0x00000002
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000

/* console starts outputting unreadable garbage data after this point*/

More Information.

EXCVADDR : 0x00000000 indicates that a null pointer is sent as next execution address. this happens at 716ms after device boot after SPI RAM memory test.

I (716) esp_psram: SPI SRAM memory test OK Guru Meditation Error: Core 0 panic'ed (Double exception).

Looks like there is a problem starting PRO CPU user code.

sdkconfig.zip

Main task code:

// entry task of ESP-IDF; it always runs on CORE_0 and will self delete on completion
void app_main() {

    //initialize task qeues
    reference_couter_pulse_queue    = xQueueCreate(10, sizeof(flow_session_tick_t));
    alarm_timer_queue               = xQueueCreate(10, sizeof(flow_session_tick_t));
    gpio_evt_queue                  = xQueueCreate(10, sizeof(uint32_t));
    app_status_queue                = xQueueCreate(10, sizeof(app_status));

    TaskHandle_t task_h0 = NULL, task_h1 = NULL;

    // create output display routine task that runs on CORE_1 
    ESP_LOGI(TAG, "Create taskDisplayroutine_cpu_1 task on cpu_1");
    xTaskCreatePinnedToCore(taskDisplayroutine_cpu_1,"display_routine_cpu_1",3072, NULL, 2, &task_h1, 1);
    printf("taskFlowCalibrationProgram_cpu_1 stack used bytes: %d\n", 3072 - uxTaskGetStackHighWaterMark(task_h1));

    //wait for all the display elements to finish up initializing
    vTaskDelay(2000 / portTICK_PERIOD_MS);

    // create Flow Calibration Program task that runs on CORE_0
    ESP_LOGI(TAG, "Create taskFlowCalibrationProgram_cpu_0 task on cpu_0");
    xTaskCreatePinnedToCore(taskFlowCalibrationProgram_cpu_0,"flow_calibration_program_cpu_0",3072, NULL, 2, &task_h0, 0);
    printf("taskFlowCalibrationProgram_cpu_0 stack used bytes: %d\n", 3072 - uxTaskGetStackHighWaterMark(task_h0));

    /*  while (true) {
        //printf("taskFlowCalibrationProgram_cpu_0 stack used bytes: %d\n", 3072 - uxTaskGetStackHighWaterMark(task_h0));
        //printf("taskFlowCalibrationProgram_cpu_0 stack used bytes: %d\n", 3072 - uxTaskGetStackHighWaterMark(task_h1));
        vTaskDelay(100 / portTICK_PERIOD_MS);
    }  */

}
rolupusoru commented 1 year ago

After peeking into RTOS with GDB debugger, it appears that PSRAM test is to blame. The moment of the test and its duration seems to create some sort of a race condition. By enabling system features caused additional delay in the test eliminating the error. I still don't understand how user code may have affected this.

/*
 Simple RAM test. Writes a word every 32 bytes. Takes about a second to complete for 4MiB. Returns
 true when RAM seems OK, false when test fails. WARNING: Do not run this before the 2nd cpu has been
 initialized (in a two-core system) or after the heap allocator has taken ownership of the memory.
*/
static bool s_test_psram(intptr_t v_start, size_t size, intptr_t reserved_start, intptr_t reserved_end)

I disabled PSRAM test and system seems to be running fine so far.

rolupusoru commented 1 year ago

Core 1 is running LVGL library and is configured to use PSRAM. I see a flaw incpu_start.c function in my opinion.

At line 229, Core 1 is being started: and Core 0 waits for it to finish
At line 426 Core 0 is testing PSRAM

So this means that Core 1 is getting started while Core 0 still has some system tests to perform. Because the test takes up to 1 second, Core 1 scheduler may be entering user code and writing to PSRAM before Core 0 has finished testing PSRAM.

rolupusoru commented 1 year ago

Sadly, even with PSRAM test disabled it only worked for a limited time. As I was able to continue working on the application it eventually crashed again with double exception

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x2a (SPI_FAST_FLASH_BOOT)
Saved PC:0x40376ae8
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x18d8
load:0x403c9700,len:0xed4
load:0x403cc700,len:0x36c8
entry 0x403c99cc
I (24) boot: ESP-IDF 5.0.1 2nd stage bootloader
I (24) boot: compile time 00:16:14
I (24) boot: chip revision: v0.1
I (26) qio_mode: Enabling QIO for flash chip WinBond
I (32) boot.esp32s3: Boot SPI Speed : 80MHz
I (36) boot.esp32s3: SPI Mode       : QIO
I (41) boot.esp32s3: SPI Flash Size : 16MB
I (46) boot: Enabling RNG early entropy source...
I (51) boot: Partition Table:
I (55) boot: ## Label            Usage          Type ST Offset   Length
I (62) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (70) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (77) boot:  2 factory          factory app      00 00 00010000 00400000
I (85) boot: End of partition table
I (89) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=18420h ( 99360) map
I (113) esp_image: segment 1: paddr=00028448 vaddr=3fc95100 size=03a48h ( 14920) load
I (117) esp_image: segment 2: paddr=0002be98 vaddr=40374000 size=04180h ( 16768) load
I (123) esp_image: segment 3: paddr=00030020 vaddr=42000020 size=50004h (327684) map
I (179) esp_image: segment 4: paddr=0008002c vaddr=40378180 size=0ce9ch ( 52892) load
I (199) boot: Loaded app from partition at offset 0x10000
I (199) boot: Disabling RNG early entropy source...
I (211) octal_psram: vendor id    : 0x0d (AP)
I (211) octal_psram: dev id       : 0x02 (generation 3)
I (211) octal_psram: density      : 0x03 (64 Mbit)
I (216) octal_psram: good-die     : 0x01 (Pass)
I (221) octal_psram: Latency      : 0x01 (Fixed)
I (226) octal_psram: VCC          : 0x01 (3V)
I (232) octal_psram: SRF          : 0x01 (Fast Refresh)
I (237) octal_psram: BurstType    : 0x01 (Hybrid Wrap)
I (243) octal_psram: BurstLen     : 0x01 (32 Byte)
I (249) octal_psram: Readlatency  : 0x02 (10 cycles@Fixed)
I (255) octal_psram: DriveStrength: 0x00 (1/1)
I (261) esp_psram: Found 8MB PSRAM device
I (265) esp_psram: Speed: 80MHz
I (269) cpu_start: Pro cpu up.
I (272) cpu_start: Starting app cpu, entry point is 0x403762b8
�3��2��m
� (��26"1�) ��
cp�u�_�s�*ta*��t�: �A�p
�p�� ��c��p�u� �u�p.�
"[0�m�
��
����*�����
�������W (286) rtcinit: efuse read fail, k_dig_ldo: 0,  v_dig_bias20: 0

Guru Meditation Error: Core  0 panic'ed (Double exception). 

Core  0 register dump:
PC      : 0x4037ca56  PS      : 0x00040533  A0      : 0x00000000  A1      : 0x3fceb1a0
A2      : 0x0000001f  A3      : 0x00000001  A4      : 0x00000000  A5      : 0x00000001
A6      : 0x00000000  A7      : 0x00000017  A8      : 0x0000001e  A9      : 0x00002a59
A10     : 0x000032c7  A11     : 0x000000d3  A12     : 0x3c06f10c  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3fceb1d7  SAR     : 0x00000004  EXCCAUSE: 0x00000002
EXCVADDR: 0xfffffff0  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000
esp-wzh commented 1 year ago

@rolupusoru Thanks for your report!

This looks like caused by a bug we recently found in esptool, which has a very low probability of triggering and we have fixed it in commit. This fix will be released in v4.7, for now, if you encounter this issue again, you can try to upgrade esptool by pip install esptool==4.7.dev1

rolupusoru commented 1 year ago

Hi,

Thanks for your feedback. I disabled PSRAM in my project and this made the error go away. However I will have to re-enable it in the same project because i will add HTTP functionality.

I will posta gain if there will be any issues.

esp-wzh commented 8 months ago

@rolupusoru Thanks for your report, this is an esptool bug that has a very low probability of being triggered.

Fixed in https://github.com/espressif/esptool/commit/c48523e0af4a330c307089757aa42e417d6f7de3, and this problem will not occur after esptool v4.7.0

If no more issues, please close this thread.

Alvin1Zhang commented 8 months ago

Thanks for reporting, feel free to reopen.

ClaesIvarsson commented 8 months ago

I am on esp-idf v4.4.6 and the esptool version is v3.3.3-dev. is this version affected by this issue ? If it is, I would like you to consider updating the release/v4.4 branch of esp-idf.

esp-wzh commented 8 months ago

@ClaesIvarsson

This fix has been backported to release/v3, esptool commit: 7b17ea07

the corresponding IDF submodule update commit is 550e7db1, which has been released with v4.4.6

ClaesIvarsson commented 8 months ago

Thanks for the clarification!