espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.89k stars 7.08k forks source link

[Linux target] httpd_server crash when using vTaskDelay in handler (IDFGH-12934) #13894

Open matthew-8925 opened 1 month ago

matthew-8925 commented 1 month ago

Answers checklist.

IDF version.

v5.2, master

Espressif SoC revision.

n/a

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

n/a

Power Supply used.

USB

What is the expected behavior?

Run the http_server/simple on Linux, with a modified handler using vTaskDelay()

cd /opt/esp/idf/examples/protocols/http_server/simple idf.py --preview set-target linux

Modified the /hello handler to include a vTaskDelay(1). Expect to be able to request the /hello handler without crashing.

What is the actual behavior?

docker run --rm -p 8001:8001 -it espressif/idf:release-v5.2 bash apt update && apt install -y gdb vim cd /opt/esp/idf/examples/protocols/http_server/simple idf.py --preview set-target linux vim main/main.c # ... Edit as per diff below git diff .

diff --git a/examples/protocols/http_server/simple/main/main.c b/examples/protocols/http_server/simple/main/main.c
index 4a0678a39c..f834561a17 100644
--- a/examples/protocols/http_server/simple/main/main.c
+++ b/examples/protocols/http_server/simple/main/main.c
@@ -169,6 +169,8 @@ static void httpd_register_basic_auth(httpd_handle_t server)
 /* An HTTP GET handler */
 static esp_err_t hello_get_handler(httpd_req_t *req)
 {
+    vTaskDelay(1);
+
     char*  buf;
     size_t buf_len;

@@ -460,6 +462,6 @@ void app_main(void)
     server = start_webserver();

     while (server) {
-        sleep(5);
+        vTaskDelay(5);
     }
 }

idf.py build
gdb -ex run ./build/simple.elf curl http://localhost:8001/hello (from other console)

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/simple.elf...
Starting program: /opt/esp/idf/examples/protocols/http_server/simple/build/simple.elf 
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x6225557d74c0 (LWP 21417)]
I (32334332) port: Starting scheduler.
[New Thread 0x622554ccdb40 (LWP 21418)]
[New Thread 0x622554cd1c40 (LWP 21419)]
[New Thread 0x7d2a8c006840 (LWP 21420)]
I (32334341) example: Starting server on port: '8001'
[New Thread 0x7d2a92e00640 (LWP 21421)]
I (32334342) example: Registering URI handlers
simple.elf: /IDF/components/freertos/FreeRTOS-Kernel/tasks.c:3675: vTaskSwitchContext: Assertion `uxTopPriority' failed.

Thread 6 "simple.elf" received signal SIGABRT, Aborted.
[Switching to Thread 0x7d2a92e00640 (LWP 21421)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=137621806253632) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=137621806253632) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=137621806253632) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=137621806253632, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007d2a933b5476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007d2a9339b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007d2a9339b71b in __assert_fail_base (fmt=0x7d2a93550130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x622554cb934a "uxTopPriority", file=0x622554cb9148 "/IDF/components/freertos/FreeRTOS-Kernel/tasks.c", line=3675, 
    function=<optimized out>) at ./assert/assert.c:92
#6  0x00007d2a933ace96 in __GI___assert_fail (assertion=assertion@entry=0x622554cb934a "uxTopPriority", file=file@entry=0x622554cb9148 "/IDF/components/freertos/FreeRTOS-Kernel/tasks.c", line=line@entry=3675, 
    function=function@entry=0x622554cb97c0 <__PRETTY_FUNCTION__.23> "vTaskSwitchContext") at ./assert/assert.c:101
#7  0x0000622554ca22be in vTaskSwitchContext () at /opt/esp/idf/components/freertos/FreeRTOS-Kernel/tasks.c:3675
#8  0x0000622554ca3442 in vPortYieldFromISR () at /opt/esp/idf/components/freertos/FreeRTOS-Kernel/portable/linux/port.c:281
#9  0x0000622554ca3636 in vPortYield () at /opt/esp/idf/components/freertos/FreeRTOS-Kernel/portable/linux/port.c:293
#10 0x0000622554ca21dc in vTaskDelay (xTicksToDelay=xTicksToDelay@entry=1) at /opt/esp/idf/components/freertos/FreeRTOS-Kernel/tasks.c:1588
#11 0x0000622554ca56d8 in hello_get_handler (req=0x7d2a8c007558) at /opt/esp/idf/examples/protocols/http_server/simple/main/main.c:172
#12 0x0000622554cab26e in httpd_uri (hd=hd@entry=0x7d2a8c0074a0) at /opt/esp/idf/components/esp_http_server/src/httpd_uri.c:329
#13 0x0000622554ca9022 in httpd_parse_req (hd=hd@entry=0x7d2a8c0074a0) at /opt/esp/idf/components/esp_http_server/src/httpd_parse.c:667
#14 0x0000622554ca916f in httpd_req_new (hd=hd@entry=0x7d2a8c0074a0, sd=sd@entry=0x7d2a8c007a60) at /opt/esp/idf/components/esp_http_server/src/httpd_parse.c:795
#15 0x0000622554ca9ec0 in httpd_sess_process (hd=0x7d2a8c0074a0, session=session@entry=0x7d2a8c007a60) at /opt/esp/idf/components/esp_http_server/src/httpd_sess.c:426
#16 0x0000622554ca725b in httpd_process_session (session=session@entry=0x7d2a8c007a60, context=context@entry=0x7d2a92dffd80) at /opt/esp/idf/components/esp_http_server/src/httpd_main.c:255
#17 0x0000622554ca965f in httpd_sess_enum (hd=hd@entry=0x7d2a8c0074a0, enum_function=enum_function@entry=0x622554ca71f3 <httpd_process_session>, context=context@entry=0x7d2a92dffd80) at /opt/esp/idf/components/esp_http_server/src/httpd_sess.c:50
#18 0x0000622554ca7efc in httpd_server (hd=hd@entry=0x7d2a8c0074a0) at /opt/esp/idf/components/esp_http_server/src/httpd_main.c:305
#19 0x0000622554ca8036 in httpd_thread (arg=0x7d2a8c0074a0) at /opt/esp/idf/components/esp_http_server/src/httpd_main.c:327
#20 0x00007d2a93407ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#21 0x00007d2a93498a04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
(gdb) 
quit

Steps to reproduce.

Change examples/protocols/http_server/simple/main/main.c to use vTaskDelay(1) in the main loop and hander, resulting in crash.

We have random double free()s when using a mutex in the handler, but the vTaskDelay is the easiest to reproduce.

Debug Logs.

No response

More Information.

No response

0xjakob commented 4 weeks ago

@matthew-8925 Thanks for noticing this problem! We could reproduce it. As of now, it looks a bit as if the idle task in the simulation is not correctly starting up. Hence, it is not in the "task ready list" (pxReadyTasksLists in tasks.c) and the assert, that basically checks that a task is always available, fails.

This might happen due to the higher priority HTTPD task that runs select(). select() is not recognized by the simulator as a blocking call, which causes the HTTPD task to be scheduled even though it's blocked from the actual OS perspective. This, in turn, might starve other tasks such as the IDLE task, before it even gets ready. This explanation is not bullet-proof, however, and we need to further investigate. We'll keep you updated here.

0xjakob commented 4 weeks ago

@matthew-8925 The problem is that the esp_http_server component creates a "normal" pthread that is running in parallel with the simulated FreeRTOS tasks. Once you call vTaskDelay() from that pthread in the http callback, it creates a situation that shouldn't be possible: vTaskDelay() eventually tries to suspend the current FreeRTOS task and run the next FreeRTOS task that is in "ready" state. But since all the other tasks in the simulation are running, blocked or suspended, there is no "ready" task and that's why the assertion triggers.

The workaround on Linux for now would be to completely avoid any FreeRTOS API calls in the http callback. It's probably also a very good idea to avoid any system calls in the http task or block all signals. Otherwise, the task could receive a signal meant for another simulated FreeRTOS task, which would create havoc.

That being said, we are working an a long-term solution for this and other problems with the current Linux simulator. We don't have an ETA yet, though.

Please also note that this example hasn't listed the Linux target in its README file. This means - among other things - that it is not regularly tested in our CI system, which makes encountering any bugs much more likely.

matthew-8925 commented 2 weeks ago

Thanks for the explanation, that's useful to know.