Closed rolupusoru closed 8 months ago
After peeking into RTOS with GDB debugger, it appears that PSRAM test is to blame. The moment of the test and its duration seems to create some sort of a race condition. By enabling system features caused additional delay in the test eliminating the error. I still don't understand how user code may have affected this.
/*
Simple RAM test. Writes a word every 32 bytes. Takes about a second to complete for 4MiB. Returns
true when RAM seems OK, false when test fails. WARNING: Do not run this before the 2nd cpu has been
initialized (in a two-core system) or after the heap allocator has taken ownership of the memory.
*/
static bool s_test_psram(intptr_t v_start, size_t size, intptr_t reserved_start, intptr_t reserved_end)
I disabled PSRAM test and system seems to be running fine so far.
Core 1 is running LVGL library and is configured to use PSRAM. I see a flaw incpu_start.c
function in my opinion.
At line 229, Core 1 is being started: and Core 0 waits for it to finish
At line 426 Core 0 is testing PSRAM
So this means that Core 1 is getting started while Core 0 still has some system tests to perform. Because the test takes up to 1 second, Core 1 scheduler may be entering user code and writing to PSRAM before Core 0 has finished testing PSRAM.
Sadly, even with PSRAM test disabled it only worked for a limited time. As I was able to continue working on the application it eventually crashed again with double exception
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x2a (SPI_FAST_FLASH_BOOT)
Saved PC:0x40376ae8
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x18d8
load:0x403c9700,len:0xed4
load:0x403cc700,len:0x36c8
entry 0x403c99cc
I (24) boot: ESP-IDF 5.0.1 2nd stage bootloader
I (24) boot: compile time 00:16:14
I (24) boot: chip revision: v0.1
I (26) qio_mode: Enabling QIO for flash chip WinBond
I (32) boot.esp32s3: Boot SPI Speed : 80MHz
I (36) boot.esp32s3: SPI Mode : QIO
I (41) boot.esp32s3: SPI Flash Size : 16MB
I (46) boot: Enabling RNG early entropy source...
I (51) boot: Partition Table:
I (55) boot: ## Label Usage Type ST Offset Length
I (62) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (70) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (77) boot: 2 factory factory app 00 00 00010000 00400000
I (85) boot: End of partition table
I (89) esp_image: segment 0: paddr=00010020 vaddr=3c060020 size=18420h ( 99360) map
I (113) esp_image: segment 1: paddr=00028448 vaddr=3fc95100 size=03a48h ( 14920) load
I (117) esp_image: segment 2: paddr=0002be98 vaddr=40374000 size=04180h ( 16768) load
I (123) esp_image: segment 3: paddr=00030020 vaddr=42000020 size=50004h (327684) map
I (179) esp_image: segment 4: paddr=0008002c vaddr=40378180 size=0ce9ch ( 52892) load
I (199) boot: Loaded app from partition at offset 0x10000
I (199) boot: Disabling RNG early entropy source...
I (211) octal_psram: vendor id : 0x0d (AP)
I (211) octal_psram: dev id : 0x02 (generation 3)
I (211) octal_psram: density : 0x03 (64 Mbit)
I (216) octal_psram: good-die : 0x01 (Pass)
I (221) octal_psram: Latency : 0x01 (Fixed)
I (226) octal_psram: VCC : 0x01 (3V)
I (232) octal_psram: SRF : 0x01 (Fast Refresh)
I (237) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (243) octal_psram: BurstLen : 0x01 (32 Byte)
I (249) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (255) octal_psram: DriveStrength: 0x00 (1/1)
I (261) esp_psram: Found 8MB PSRAM device
I (265) esp_psram: Speed: 80MHz
I (269) cpu_start: Pro cpu up.
I (272) cpu_start: Starting app cpu, entry point is 0x403762b8
�3��2��m
� (��26"1�) ��
cp�u�_�s�*ta*��t�: �A�p
�p�� ��c��p�u� �u�p.�
"[0�m�
��
����*�����
�������W (286) rtcinit: efuse read fail, k_dig_ldo: 0, v_dig_bias20: 0
Guru Meditation Error: Core 0 panic'ed (Double exception).
Core 0 register dump:
PC : 0x4037ca56 PS : 0x00040533 A0 : 0x00000000 A1 : 0x3fceb1a0
A2 : 0x0000001f A3 : 0x00000001 A4 : 0x00000000 A5 : 0x00000001
A6 : 0x00000000 A7 : 0x00000017 A8 : 0x0000001e A9 : 0x00002a59
A10 : 0x000032c7 A11 : 0x000000d3 A12 : 0x3c06f10c A13 : 0x00000000
A14 : 0x00000000 A15 : 0x3fceb1d7 SAR : 0x00000004 EXCCAUSE: 0x00000002
EXCVADDR: 0xfffffff0 LBEG : 0x400570e8 LEND : 0x400570f3 LCOUNT : 0x00000000
@rolupusoru Thanks for your report!
This looks like caused by a bug we recently found in esptool, which has a very low probability of triggering and we have fixed it in commit. This fix will be released in v4.7, for now, if you encounter this issue again, you can try to upgrade esptool by pip install esptool==4.7.dev1
Hi,
Thanks for your feedback. I disabled PSRAM in my project and this made the error go away. However I will have to re-enable it in the same project because i will add HTTP functionality.
I will posta gain if there will be any issues.
@rolupusoru Thanks for your report, this is an esptool bug that has a very low probability of being triggered.
Fixed in https://github.com/espressif/esptool/commit/c48523e0af4a330c307089757aa42e417d6f7de3, and this problem will not occur after esptool v4.7.0
If no more issues, please close this thread.
Thanks for reporting, feel free to reopen.
I am on esp-idf v4.4.6 and the esptool version is v3.3.3-dev. is this version affected by this issue ? If it is, I would like you to consider updating the release/v4.4 branch of esp-idf.
Thanks for the clarification!
Answers checklist.
IDF version.
framework-espidf @ 3.50002.230512 (5.0.2)
Operating System used.
Windows
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
None
Development Kit.
Lilygo T-Display-S3 (ESP32-S3R8)
Power Supply used.
USB
What is the expected behavior?
Main task expected to execute
What is the actual behavior?
Double exception thrown in RTOS before main task is started. Check output logs for list of events.
Steps to reproduce.
Accidentally came across this problem as the application code continued to grow in size. Only task running on CPU 0 seems to have this issue.
If I comment out random lines of code on CPU 0 task the error goes away. If i enable extra System Features features such a ( e.g. TRAX) problem disappears.
Debug Logs.
More Information.
EXCVADDR : 0x00000000 indicates that a null pointer is sent as next execution address. this happens at 716ms after device boot after SPI RAM memory test.
I (716) esp_psram: SPI SRAM memory test OK Guru Meditation Error: Core 0 panic'ed (Double exception).
Looks like there is a problem starting PRO CPU user code.
sdkconfig.zip
Main task code: