Closed 0xjmux closed 1 week ago
I forgot to include an example of a stack trace, so here's one of the more helpful ones I got:
I really want to see the internals of esp_now_is_peer_exist()
, but I can't seem to find it anywhere - I doubt that the issue is there and not in my code, but it'd still be helpful to have increased visibility into what its doing.
I've attempted to use the "Start Heap Trace" feature in ESP-IDF - the button in the vscode extension doesn't give me any result under "Application Trace Archives", and as far as I can see no files are created in the repository directory. When I run it manually, (with heap_trace_start()
as the first line of example_espnow_recv_cb()
and heap_trace_stop()
and dump at the end of espnow_recv_task()
, this is the result I get:
After enabling comprehensive heap debugging in SDKCONFIG like so:
I no longer get the messages about crashes - I just get a stack overflow crash. After a lot of trying I was finally able to extract an .elf core dump that can be opened with idf.py coredump-debug -c crash-coredump.elf
. In the interest of speeding up the debugging process, I've attached it to this comment. If someone would be able to help me with taking a look at it, I'd greatly appreciate it. Thanks!
After many more hours of attempted debugging, not much progress has been made. I've spent nearly 6 hours at this point trying to get the App and heap debugging features in ESP-IDF to work as intended, have read all the docs I can find, and have tried everything i can think of. My attempts to use SystemView have completely failed, and no matter what I do (including attaching an ESP-Prog to get an additional UART for just the core dump) nothing works. The ESP crashes and just stops printing data, and GDB gives me not super helpful backtraces like those seen above.
Whenever I run the App/heap traces the log files end up empty; I have the libraries included in my code and have tried manually init'ing the functionality. I've tried both through the CLI and through the VScode extension - neither work.
This is becoming extremely frustrating - If I can't even get minor modifications to the provided example to work without causing disastrous overflows, how am I supposed to be confident about using it in production?
Any help I can get resolving this issue as soon as possible would be greatly appreciated. Thanks.
@0xjmux Could you add your application code based on get-started example and check again? BTW, esp-now component as an application-level ESP-NOW which provides some high-level functionalities. In short, espnow_xxx adds some application processing based on esp_now_xxx. The detailed introduction, please see the link as follows: https://github.com/espressif/esp-now/blob/master/User_Guide.md#introduction
@lhespress I had tried to get the get-started
example working previously - no matter what I do, it refuses to compile. I've followed those instructions before, but just did so again after freshly cloning the repository again.
When I run idf.py menuconfig
or idf.py set-target esp32s3
, I get:
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building ESP-IDF components for target esp32s3
Target changed from esp32 to esp32s3, solving dependencies.
Using component placed at /home/jmux/Github/Neopixel-Matrix-Tetris-Project/libs-examples/esp-now-main-repo for dependency esp-now(2.*), specified in /home/jmux/Github/Neopixel-Matrix-Tetris-Project/libs-examples/esp-now-main-repo/examples/get-started/main/idf_component.yml
....Updating lock file at /home/jmux/Github/Neopixel-Matrix-Tetris-Project/libs-examples/esp-now-main-repo/examples/get-started/dependencies.lock
Processing 3 dependencies:
[1/3] esp-now (2.5.0)
[2/3] espressif/cmake_utilities (0.5.3)
[3/3] idf (5.1.2)
CMake Error at /home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:266 (message):
Failed to resolve component 'esp-now'.
Call Stack (most recent call first):
/home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:308 (__build_resolve_and_add_req)
/home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:595 (__build_expand_requirements)
/home/jmux/Documents/esp/esp-idf/tools/cmake/project.cmake:547 (idf_build_process)
CMakeLists.txt:14 (project)
When I run idf.py build
, I get the same error. Trying to configure the project or compile in VSCode gives the same result. This is on a fresh clone of the repo with idf.py --version
== ESP-IDF v5.1.2
@0xjmux It seems don't find esp-now
components. Which environment do you used? Linux, Mac or Windows? BTW, the attachment is the firmware on my Mac environment.
get-start.zip
I'm on Ubuntu 22.04LTS and ESP-IDF v5.1.2
It seems to find it fine in the example built in to ESP-IDF, and when I import esp_now.h
into my program - but none of the examples in the esp-now examples folder will compile.
Using the "IDF Component Registry" in the VSCode extension, I tried to install the esp-now
component directly. The installation fails with the following error:
CMake Error at /home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:266 (message):
Failed to resolve component 'esp-now'.
Call Stack (most recent call first):
/home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:308 (__build_resolve_and_add_req)
/home/jmux/Documents/esp/esp-idf/tools/cmake/build.cmake:595 (__build_expand_requirements)
/home/jmux/Documents/esp/esp-idf/tools/cmake/project.cmake:547 (idf_build_process)
CMakeLists.txt:14 (project)
Clicking on CMakeOutput.log gives the following log:
This is nonetheless somewhat off-topic from the original question - As far as I can tell, there's not much guarantee that using the higher abstraction layer would avoid a similar memory leak, since what I'm trying to do is already so simple. Would you be able to take a look at identifying the possible culprit in the example code I posted above?
Thanks.
I just attempted to upgrade my ESP-IDF to v5.2.1, and was quickly reminded why I had been avoiding doing that. The SDK still refuses to work with intellisense out of the box, and getting the bare minimum functional is quite the frustrating ordeal. If this is something you guys somehow aren't aware of, you should add testing the installation of ESP-IDF through VSCode to the list - fresh install, open any example, and you get a sea of red. I've lost count of how many hours I've spent just trying to get the SDK working - not being able to tell it to use a pyenv virtualenv during installation is another annoyance.
It also didn't help, building the get-started
example through the IDE or via idf.py build
still fails with the same error.
I've been losing my mind as to what I've been doing wrong, so as a sanity check I enabled comprehensive heap detection and tracing in the default wifi/espnow
example, compiled it and ran it.
It seems that I'm not going insane - it detects a stack overflow in the example that comes with ESP-IDF.
idf.py openocd
Info : [esp32s3.cpu0] Target halted, PC=0x40375A81, debug_reason=00000001
Info : [esp32s3.cpu0] Halt cause (***ERROR*** A stack overflow in task example_espnow_ has been detected.)
idf.py monitor:
I (18574) espnow_example: send data to ff:ff:ff:ff:ff:ff
I (18574) espnow_example: Receive error data from: [MAC OF MY OTHER ESPNOW DEVICE]
[at this point, it crashes and nothing else shows up]
idf.py gdbtui
[esp32s3.cpu0] Target halted, PC=0x40375A81, debug_reason=00000001
[esp32s3.cpu0] Halt cause (***ERROR*** A stack overflow in task example_espnow_ has been detected.)
[New Thread 1070275316]
[New Thread 1070245204]
[New Thread 1070230808]
[New Thread 1070234848]
Thread 8 "example_espnow_" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 1070275316]
0x40375a81 in panic_abort (details=0x3fcb12ac "***ERROR*** A stack overflow in task example_espnow_ has been detected.") at /home/jmux/Documents/esp/v5.2.1/
esp-idf/components/esp_system/panic.c:472
(gdb) bt
#0 0x40375a81 in panic_abort (details=0x3fcb12ac "***ERROR*** A stack overflow in task example_espnow_ has been detected.")
at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/esp_system/panic.c:472
#1 0x4037c6f4 in esp_system_abort (details=0x3fcb12ac "***ERROR*** A stack overflow in task example_espnow_ has been detected.")
at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/esp_system/port/esp_system_chip.c:93
#2 0x4037d5c1 in vApplicationStackOverflowHook (xTask=0x3fcb1af4, pcTaskName=0x3fcb1b28 "example_espnow_")
at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:553
#3 0x4037e9f2 in vTaskSwitchContext () at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:3630
#4 0x4037d687 in _frxt_dispatch () at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:451
#5 0x4037d67d in _frxt_int_exit () at /home/jmux/Documents/esp/v5.2.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/portasm.S:24
I created a brand new project from the wifi/espnow
example, and haven't made any modifications other than enabling heap debugging features in menuconfig.
I should mention that this is on the latest stable release of ESP-IDF, since I bit the bullet and spent 4 hours reconfiguring the extension to work again after updating it.
$ idf.py --version
ESP-IDF v5.2.1
Since this bug is not in my modified code but rather in the example code that ships with ESP-IDF, it should be easier to triage from your guy's end. If someone could take a look at resolving that, I'd greatly appreciate it.
@0xjmux I have tested the wifi/espnow
example on the IDF v5.2.1 on my side, when the comprehensive heap detection
option is enabled, it does generate a task watchdog.
When the comprehensive heap detection
option is enabled, it will consume more memory, so you need to increase the stack size of the example_espnow_task
. After I increase the stack size to 4096, the example can work well.
xTaskCreate(example_espnow_task, "example_espnow_task", 4096, send_param, 4, NULL);
For your original code, you can also increase the stack size of the watchdog task, and then enable comprehensive heap detection
option to debug.
@0xjmux Closing this issue since there has been no update on this. Please feel free to reopen if required.
EDIT: this post was originally titled "Minor modifications to esp-idf espnow example causes intermittent crashes (AEGHB-604)"
Hi all, I'm trying to ingest messages from the Wizmote (which uses ESP-IDF to send messages) into an esp-idf program as part of a larger project. Something similar has been done before in the WLED project, albeit using arduino.
I modified the
espnow
example to remove the sending code and to parse the data in the wizmote format, and it works - some of the time. For unknown reasons, after receiving a few (it's not usually the same amount) packets, the program will panic. I've seen a few errors, but the most common has been a stack overflow, which makes me think it's likely a memory issue. I've been trying to use JTAG with an ESP32-S3 to debug it, but because of FreeRTOS's threading having a useful backtrace on a crash is a bit of a coin toss. Where I've been able to locate a crash back to in my code, I've left a comment indicating so.I've gone about as far as I can, but I'm tearing my hair out trying to figure out what's causing the crash, and none of the heap tracing or core dump features of the SDK have been working for
If you'd be able to take a quick look at my test program and see if you can spot any obvious errors, I'd greatly appreciate it.
Thanks!
Code
Expand here for code!
To duplicate my project structure, create an instance of the `espnow` example program in ESP-IDF, add the two files to the directory, and comment out `app_main()` from the original `espnow_example_main.c`. espnow_remote_test.c ```c /** * Test code for ingesting messages from the Wizmote ESP-NOW remote * espnow_remote_test.c */ // this include block copied from espnow_example_main.c #include