[BUG] Wrong critical section handling before the scheduler is started

stefano-zanotti commented 3 years ago

Describe the bug Before the scheduler has started, any call to FreeRTOS functions that require a critical section will mess up the interrupt enabling status. This includes calls made by vTaskStartScheduler() itself, e.g. to create the idle task.

Target

Development board: irrelevant (STM32F746NG)
Instruction Set Architecture: ARM_CM7
IDE and version: irrelevant (Atollic TrueSTUDIO 9.2.0)
Toolchain and version: irrelevant (arm-atollic-eabi-gcc 6.3.1 20170215 (release) [ARM/embedded-6-branch revision 245512])

Host

Host OS: irrelevant (Windows)
Version: irrelevant (Windos 10 Pro)

To Reproduce

Just follow the execution of vTaskStartScheduler() in any ARM_CM7 project I'm also attaching a sample project created with STM32CubeMX for my MCU (STM32F746NG), with just FreeRTOS enabled; STM32CubeMX uses an older version (10.2.1), but the problem is exactly the same. In "Core/Src/main.c", you'll find the calls to MX_FREERTOS_Init() -to initializes the tasks- and osKernelStart() -to start the scheduler.

Additional context In my init code, I first create a task (and possibly queues, mutexes, etc.), then start the scheduler. The problem is that

in the task initialization code, there are critical sections, which use taskENTER_CRITICAL() and taskEXIT_CRITICAL()
these functions reference the global variable uxCriticalNesting
uxCriticalNesting is statically initialised to 0xaaaaaaaa, and it only gets initialized to 0 when the scheduler is started (vTaskStartScheduler()). This means that taskENTER_CRITICAL() increments uxCriticalNesting and disables interrupts, and taskEXIT_CRITICAL() decrements uxCriticalNesting, but it doesn't re-enable interrupts, since uxCriticalNesting is back to 0xaaaaaaaa, rather than back to 0. So, interrupts remain disabled until the scheduler is actually started.

I'm using FreeRTOS Kernel V10.3.1, with the GCC ARM_CM7 port (though at a first glance it seems that FreeRTOS Kernel V10.4.3 behaves the same). This is where uxCriticalNesting is defined: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/ARM_CM7/r0p1/port.c#L140 static UBaseType_t uxCriticalNesting = 0xaaaaaaaa;

These are the definitions of (the port-specific versions of) taskENTER_CRITICAL() and taskEXIT_CRITICAL() https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/ARM_CM7/r0p1/port.c#L396 https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/ARM_CM7/r0p1/port.c#L413 void vPortEnterCritical( void ) { portDISABLE_INTERRUPTS(); uxCriticalNesting++; ... } void vPortExitCritical( void ) { configASSERT( uxCriticalNesting ); uxCriticalNesting--;

    if( uxCriticalNesting == 0 )
    {
        portENABLE_INTERRUPTS();
    }
}

Note that "disabling interrupts" doesn't mean globally masking them (using the "cpsie i" and "cpsid i" ARM instructions); rather, it means altering BASEPRI (which only masks interrupts with priority no higher than that of FreeRTOS). See here: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/ARM_CM7/r0p1/portmacro.h#L100

Here is a summary of what is happening. I'm listing function calls and variable values.

// static init: uxCriticalNesting = 0xaaaaaaaa, ints enabled xTaskCreateStatic() prvAddNewTaskToReadyList() taskENTER_CRITICAL() // uxCriticalNesting = 0xaaaaaaab, ints disabled taskEXIT_CRITICAL() // uxCriticalNesting = 0xaaaaaaaa, ints disabled vTaskStartScheduler() xTaskCreateStatic() // for the idle task prvAddNewTaskToReadyList() taskENTER_CRITICAL() // uxCriticalNesting = 0xaaaaaaab, ints disabled taskEXIT_CRITICAL() // uxCriticalNesting = 0xaaaaaaaa, ints disabled xTimerCreateTimerTask() prvCheckForValidListAndQueue() taskENTER_CRITICAL() // uxCriticalNesting = 0xaaaaaaab, ints disabled taskEXIT_CRITICAL() // uxCriticalNesting = 0xaaaaaaaa, ints disabled prvInitialiseNewQueue() xQueueGenericReset() taskENTER_CRITICAL() // uxCriticalNesting = 0xaaaaaaab, ints disabled taskEXIT_CRITICAL() // uxCriticalNesting = 0xaaaaaaaa, ints disabled xTaskCreateStatic() prvAddNewTaskToReadyList() taskENTER_CRITICAL() // uxCriticalNesting = 0xaaaaaaab, ints disabled taskEXIT_CRITICAL() // uxCriticalNesting = 0xaaaaaaaa, ints disabled // from now on, the behavior seems correct; it actually manages to fix the problem caused by the previous calls portDISABLE_INTERRUPTS(); // this just disables interrupts, without changing uxCriticalNesting; it's done on purpose by vTaskStartScheduler(), see the comment for https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/tasks.c#L2059 xPortStartScheduler() uxCriticalNesting=0 // https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/ARM_CM7/r0p1/port.c#L363 prvPortStartFirstTask() cpsie i svc 0 vPortSVCHandler() msr basepri, r0 // this re-enables interrupts; now we are in a clean working condition: interrupts enabled, uxCriticalNesting = 0 bx r14 // now the first task is actually started

This summary also shows that the problem is not in my creating a task beforehand; it exists even within vTaskStartScheduler(), with the creation of the idle task and the timer task. However, the problem is worse if I create objects beforehand, since interrupts remain disabled for longer.

A straightforward and very lightweight solution would be to just initialize uxCriticalNesting to 0, statically. Is there any reason to have it initialized to 0xaaaaaaaa? It seems that no function ever checks for this "uninitialized" value, and debuggers should have other better ways to detect whether the scheduler has been inited/started yet.

As an aside: is there any reason why there are these assertions in port.c? configASSERT( uxCriticalNesting == 1000UL ); configASSERT( uxCriticalNesting == ~0UL ); rather than a simple: configASSERT( 0 ); test_freertos.zip

RichardBarry commented 3 years ago

Thanks for taking the time to provide this report. In this case however the behaviour you describe is intentional and applies to all FreeRTOS port. In the early days of FreeRTOS one of the most common issues users had was installing an interrupt prior to starting the scheduler, and then for that interrupt to execute and try to use the kernel before the kernel had started. Hence, for many years now, any time you start using the kernel's API interrupts are deliberately kept masked to ensure the code can execute all the way to starting the scheduler without the above happening. As you say, interrupts are re-enabled when the scheduler starts.

We always recommend discussing topics in the support forum and then opening a bug report if the result of the discussion is that a bug exists.

stefano-zanotti commented 3 years ago

Thanks for your explanation. However, I still don't understand it. Isn't it still a problem if an interrupt tries to call FreeRTOS before it has started? The behavior of uxCriticalNesting initialized to 0xaaaaaaaa, is such that the interrupt would call FreeRTOS, interrupts would get disabled, the call would be executed, and then interrupts would be left disabled. The call itself is not prevented.

The solution would be not enabling the interrupts beforehand, or having the interrupts themselves check if the scheduler is running (all of which is the application's duty, not the rtos'). How is servicing one call and then disabling interrupts helping with the problem?

Sorry for posting here instead of the forum; now I know.

RichardBarry commented 3 years ago

The sequence would be like this:

The C startup code executes, then main() is called. The kernel can't do anything about the interrupt enable state at this point, so it is possible an interrupt could occur.
The code in main does whatever, then starts calling FreeRTOS API functions.
Inside the API function there is a call to taskENTER_CRITICAL(). That masks interrupts up the the max syscall interrupt priority, then increments critical section nesting count. From here on interrupts at priorities that can call FreeRTOS API functions are masked until the scheduler starts.
Still inside the API function there is a call to taskEXIT_CRITICAL(). That decrements the critical section nesting count, but not to zero (as it was initialised to a non-zero value), so interrupts remain masked.
Anything else that executes here does so with interrupts masked unless the application writer deliberately manually unmasks them.
The scheduler starts and unmasks all interrupts before the first task starts executing.

FreeRTOS / FreeRTOS-Kernel

[BUG] Wrong critical section handling before the scheduler is started #254