FreeRTOS / FreeRTOS-Kernel

FreeRTOS kernel files only, submoduled into https://github.com/FreeRTOS/FreeRTOS and various other repos.
https://www.FreeRTOS.org
MIT License
2.51k stars 1.05k forks source link

[BUG] Premature changing of pxCurrentTCB BEFORE actual context switch in Win32/64 Port #1054

Closed Vitaly1983 closed 1 month ago

Vitaly1983 commented 1 month ago

pxCurrentTCB is currently changed BEFORE newly selected user FreeRTOS thread (task) is resumed. This leads to sporadic errors with working of system objects, e.g., mutexes.

Target Win32/64 FreeRTOS simulator port. FreeRTOS Kernel V10.5.1, 'port.c' source file

Host Any

To Reproduce This will lead to sporadic errors, in program on multicore CPUs. The more threads will be in program, and more often they will be switched, the higher probability of errors occuring. Errors will be related to system objects (e.g., mutexes), which access pxCurrentTCB pointer.

The problem is that inside prvProcessSimulatedInterrupts()->vTaskSwitchContext()->taskSELECT_HIGHEST_PRIORITY_TASK() ->listGET_OWNER_OF_NEXT_ENTRY() there is a line:

( pxTCB ) = ( pxConstList )->pxIndex->pvOwner; there pxTCB == pxCurrentTCB

BUT, the thread which was assigned to the previously selected pxCurrentTCB, is NOT suspended during execution of vTaskSwitchContext(). If inside listGET_OWNER_OF_NEXT_ENTRY() a new thread will be selected for execution, then the previously executed will be suspended only a few lines below:

            vTaskSwitchContext();

            /* If the task selected to enter the running state is not the task
            that is already in the running state. */
            if( pvOldCurrentTCB != pxCurrentTCB )
            {
                /* Suspend the old thread.  In the cases where the (simulated)
                interrupt is asynchronous (tick event swapping a task out rather
                than a task blocking or yielding) it doesn't matter if the
                'suspend' operation doesn't take effect immediately - if it
                doesn't it would just be like the interrupt occurring slightly
                later.  In cases where the yield was caused by a task blocking
                or yielding then the task will block on a yield event after the
                yield operation in case the 'suspend' operation doesn't take
                effect immediately.  */
                pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB );
                -->>> SuspendThread( pxThreadState->pvThread );

Since there is a time gap between re-assigning pxCurrentTCB pointer and the actual suspension of previously active thread, if that thread checks for pxCurrentTCB variable in this same moment, the returned value will be incorrect.

I added the following quick fix to prvProcessSimulatedInterrupts() function:

            if (pvOldCurrentTCB)
            {
                pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB ); // <-- 1
                if (pxThreadState && pxThreadState->pvThread)
                    SuspendThread( pxThreadState->pvThread ); // <-- 2
            }

            /* Select the next task to run. */
            vTaskSwitchContext();

            /* If the task selected to enter the running state is not the task
            that is already in the running state. */
            if( pvOldCurrentTCB != pxCurrentTCB )
            {
                /* Suspend the old thread.  In the cases where the (simulated)
                interrupt is asynchronous (tick event swapping a task out rather
                than a task blocking or yielding) it doesn't matter if the
                'suspend' operation doesn't take effect immediately - if it
                doesn't it would just be like the interrupt occurring slightly
                later.  In cases where the yield was caused by a task blocking
                or yielding then the task will block on a yield event after the
                yield operation in case the 'suspend' operation doesn't take
                effect immediately.  */
                pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB );
                // SuspendThread( pxThreadState->pvThread );

                /* Ensure the thread is actually suspended by performing a
                synchronous operation that can only complete when the thread is
                actually suspended.  The below code asks for dummy register
                data.  Experimentation shows that these two lines don't appear
                to do anything now, but according to
                https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743
                they do - so as they do not harm (slight run-time hit). */
                xContext.ContextFlags = CONTEXT_INTEGER;
                ( void ) GetThreadContext( pxThreadState->pvThread, &xContext );

                /* Obtain the state of the task now selected to enter the
                Running state. */
                pxThreadState = ( ThreadState_t * ) ( *( size_t *) pxCurrentTCB );

                /* pxThreadState->pvThread can be NULL if the task deleted
                itself - but a deleted task should never be resumed here. */
                configASSERT( pxThreadState->pvThread != NULL );
                ResumeThread( pxThreadState->pvThread );
            }
            else
            {
                if (pvOldCurrentTCB && pxThreadState && pxThreadState->pvThread)
                    ResumeThread( pxThreadState->pvThread ); // <-- 2
            }

Some checks for non-NULL pointers may be redundant, but the idea should be clear. It may be even more clean not to use Resume/Suspend API in case if pxCurrentTCB was not changed. Maybe, by making vTaskSwitchContext() not to update pxCurrentTCB variable inside, but to return pointer to newly selected thread to run.

After this change, strange sporadic errors that occured before in my program, seems to disappear.

aggarg commented 1 month ago

Would you please try with the following snippet and see that it addresses the issue you mentioned:

if( ulSwitchRequired != pdFALSE )
            {
                /* Suspend the old thread. */
                pxThreadState = ( ThreadState_t * ) *( ( size_t * ) pxCurrentTCB );
                SuspendThread( pxThreadState->pvThread );

                /* Ensure the thread is actually suspended by performing a
                 * synchronous operation that can only complete when the thread
                 * is actually suspended. The below code asks for dummy register
                 * data. Experimentation shows that these two lines don't appear
                 * to do anything now, but according to
                 * https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743
                 * they do - so as they do not harm (slight run-time hit). */
                xContext.ContextFlags = CONTEXT_INTEGER;
                ( void ) GetThreadContext( pxThreadState->pvThread, &xContext );

                /* Select the next task to run. */
                vTaskSwitchContext();

                /* Obtain the state of the task now selected to enter the
                 * Running state. */
                pxThreadState = ( ThreadState_t * ) ( *( size_t * ) pxCurrentTCB );

                /* pxThreadState->pvThread can be NULL if the task deleted
                 * itself - but a deleted task should never be resumed here. */
                configASSERT( pxThreadState->pvThread != NULL );
                ResumeThread( pxThreadState->pvThread );
            }