Closed JeronimusII closed 5 months ago
Thanks for the MCVE and the crash log.
This looks like a general issue with the task manager when tasks are suspended to do FreeRTOS housekeeping work (i.e. in this case to do a delay and put the task onto the queue for wake up at the right time). Basically, during that housekeeping with task switching disabled core0 got a WiFi packet interrupt which lead to a malloc
which needs to grab the malloc lock and causes the crash.
In tasks.c I think I need to disable IRQs during those times when taskSuspendAll is in force.
Unfortunately, I don't seem to be able to get the MCVE to crash. I've had it in the background for 10 mins so far, with a new connection coming in every 1 second (while true; do telnet pico.io 5000; sleep 1; done
).
So I am not sure I would be able to test any changes. But I should have a PR in a few minutes if you would be able to try it out., @JeronimusII , and report back.
After 20 mins I did get a hang using master
and while true; do telnet 192.168.1.200 5000; sleep 3; done
. Using the PR #2155 I've run for 40 mins without any issue, so I think we're good. I'll let it run a while longer and then do the merge.
It's run for >2hrs now with the change so I've done a merge and this will be in the next release.
After 5 hours its still running fine so the issue seems to be fixed on my side too. @earlephilhower Thanks for the help!
When using the WiFiServer while using FreeRTOS to repeatedly accept a new connection it eventually at a seemingly random time (usually 1-10 minutes with a new connection every 3 seconds) to hang up both cores right after a new incoming connection (even before calling WiFiServer::accept()).
By pausing using a debugger and checking the call stack I figured that it is stuck in an infinite loop by FreeRTOS due to a fatal error allocating memory in the WiFiServer::_accept() method at line 197: https://github.com/earlephilhower/arduino-pico/blob/a49bcd4a9535bc9615ce46b67e91d50d501c07d0/libraries/WiFi/src/WiFiServer.cpp#L197
Which is caused by an assertion in the xQueueSemaphoreTake() method of FreeRTOS that expects the task scheduler not to be suspended: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/8e07366994f81354a2d4556ca1da9f73dab781e6/queue.c#L1672-L1677
The task scheduler seems to be suspended because the IRQ happened in the brief period the vTaskDelay() method called by some USB loop suspends the task scheduler.
I am running the program on a Raspberry Pi Pico W. I tried it with a second board in case I might have fried something.
Here is the source of program that can reproduce the issue: (Freeze happens during the delayMicroseconds() at the end of loop())
And here is the full call stack: