Open projectgus opened 2 weeks ago
Hi @projectgus , thank you for reporting!
That's a bizarre bug you caught. Could you please provide a reproducer?
But could not reproduce it with v5.2.2 (3b8741b172)
@Lapshin I haven't had any luck yet either, maybe it actually requires high Wi-Fi traffic. Will keep at it and let you know.
Answers checklist.
IDF version.
v5.2.2, also v5.2.2-639-g43098fc4de
Espressif SoC revision.
ESP32-C3 (QFN32) (revision v0.4)
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
SEEED XIAO ESP32-C3
Power Supply used.
USB
What is the expected behavior?
Load SP register with a valid address (inside the current task's stack region) without Debug Assist hardware Stack Protection triggering.
What is the actual behavior?
Loading the SP register seems to intermittently trigger a hardware stack protector interrupt. All of the reported addresses look valid for the running task, i.e. there was no stack overflow or SP corruption.
Steps to reproduce.
Reproduction currently requires the MicroPython master branch and some Python code that sends a lot of data over Wi-Fi. (The original bug is https://github.com/micropython/micropython/issues/15667)
It is probably possible to make a simpler reproducer, best guess is that the key features are:
Note that all of the jumps are happening within the same task, and the stack pointer is saved and restored each time to/from a valid value for the current executing task.
Debug Logs.
Here's a sample crash:
Note that the Stack pointer address in the dump is valid for the bounds of the task.
This crash dump was created with a couple of additions in the nlr_jump function to try and get extra debug info:
add t0,sp,zero
means temp register t0 holds the "before restore" SP value in the crash dump. Note that this SP value is also inside the task bounds.More Information.
lw sp,60(a0)
instruction is causing the stack protection to trigger.components/esp_system/port/include/private/esp_private/hw_stack_guard.h
such as addingfence
instructions and bignop
blocks at the end ofESP_HW_STACK_GUARD_MONITOR_STOP_CPU0
andESP_HW_STACK_GUARD_MONITOR_START_CPU0
macros, in case there was some race with the Debug Assist registers changing during a context switch. Still crashes, however I don't really know what I'm doing there.nlr_jump
. That seems like a possible workaround but also doesn't seem like it should be necessary...?Happy to try anything you recommend, might even be able to provide a C reproducer that uses setjmp/longjmp.