espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.33k stars 7.2k forks source link

select() blocks the FreeRTOS scheduler on Linux target (IDFGH-13498) #14395

Open snake-4 opened 3 weeks ago

snake-4 commented 3 weeks ago

Answers checklist.

IDF version.

v5.4-dev-2004-g8e4454b285

Espressif SoC revision.

Linux

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

What is the expected behavior?

The scheduler should continue executing other tasks as expected while the MQTT client is connected to a server.

What is the actual behavior?

The scheduler no longer switches between tasks when the MQTT client establishes a connection.

Steps to reproduce.

The above code when built for the Linux target, will block the FreeRTOS scheduler while it attempts to establish a connection and while it maintains the connection.

Debug Logs.

No response

More Information.

No response

snake-4 commented 3 weeks ago

I've tracked it down to this line: https://github.com/espressif/esp-idf/blob/d7ca8b94c852052e3bc33292287ef4dd62c9eeb1/components/freertos/esp_additions/FreeRTOSSimulator_wrappers.c#L41

MQTT task seems to call esp_transport_poll_read in a loop, which ends up calling select() continuously. The FreeRTOS scheduler is somehow unable to schedule tasks in this scenario.

snake-4 commented 2 weeks ago

Here's what I've found so far: CONFIG_FREERTOS_TIMER_TASK_PRIORITY is 1 by default whereas CONFIG_MQTT_TASK_PRIORITY is 5.

When the code is compiled for the embedded targets, lwIP's FreeRTOS port correctly uses the FreeRTOS for the timeout, so the task will sleep and the lower priority task will be ran by the scheduler.

However, when the Linux's select syscall is used instead, the FreeRTOS scheduler doesn't know that the task is supposed to sleep, and so it keeps scheduling the higher priority task to run.

This problem will happen with every slow syscall on the Linux target. A simple solution would be to call the actual select without the timeout and use vTaskDelay to simulate a timeout.