FreeRTOS / FreeRTOS-Kernel

FreeRTOS kernel files only, submoduled into https://github.com/FreeRTOS/FreeRTOS and various other repos.
https://www.FreeRTOS.org
MIT License
2.52k stars 1.05k forks source link

[BUG] SMP prvSelectHighestPriorityTask adds current task to the front of the ready list if pxIndex points to the head of the list #990

Closed gemarcano closed 4 months ago

gemarcano commented 5 months ago

Describe the bug The new implementation for prvSelectHighestPriorityTask for SMP uses vListInsertEnd to insert the current TCB to the end of the ready task list. vListInsertEnd doesn't actually insert an element to the end of a list-- it only adds it such that it is the last element returned by calling listGET_OWNER_OF_NEXT_ENTRY multiple times before it starts repeating. Effectively vListInsertEnd inserts the node right before the current pxIndex node of the list.

In testing with a personal project and stepping through debugging, at first the pxIndex of the ready list seems to be the tail element of the list (before the xListEnd element). Over time, however, as tasks are removed and added to the ready list, it looks like the pxIndex element migrates to the top of the list. Once it reaches the top of the list, vListInsertEnd actually ends up inserting the current task TCB node to the front of the ready list!

The fix would be to use listGET_OWNER_OF_NEXT_ENTRY to iterate through the list, instead of starting from the head element.

Target

Host

To Reproduce I don't have a generic reproducer, since it strongly depends on scheduler and task interaction. Even reproducing it on my device is almost like trying to reproduce a race condition, and any slowdown from gdb conditionals renders the issue impossible to reproduce.

My project is set up to mock an HID USB device using Tinyusb. I have a task dedicated to USB handling, a CLI task, a task mocking controller input, and a watchdog task. By dumping the list of active tasks, it looks like the pico-sdk also has a few other tasks running in the background:

Tasks active: 7
  task name: prb_cli
  task name: usb
  task name: IDLE1
  task name: IDLE0
  task name: prb_watchdog
  task name: controller
  task name: Tmr Svc

I configured all 4 of my tasks to have a core affinity so they only use core 2.

I triggered the issue by constantly requesting the CLI task to output my debug status info using uxTaskGetSystemState to get the system state. It can take seconds to almost a minute of me spamming requests (as a human, typing s and enter to trigger the CLI output) to trigger the bug.

What I observe is that the scheduler consistently schedules the current task once the bug is triggered, starving all others. Makes sense if pxIndex is the head node of the ready list, as the current task node gets added before the pxIndex node... becoming the new head node.

Expected behavior No resource starvation on the core the bug triggers in.

Screenshots N/A

Additional context

See this FreeRTOS forum post for a discussion and all of my findings about the issue.

I can open a pull request with an attempted fix, but I have no idea how to go about preparing unit tests and coverage, or how to do proper regression testing with FreeRTOS.

rawalexe commented 5 months ago

Thank you for the bug report we are looking into the problem

chinglee-iot commented 4 months ago

The PR #1000 to address this issue is merged. Thank you for creating this issue.