FreeRTOS / FreeRTOS-Kernel

FreeRTOS kernel files only, submoduled into https://github.com/FreeRTOS/FreeRTOS and various other repos.
https://www.FreeRTOS.org
MIT License
2.52k stars 1.05k forks source link

[BUG] - TCBs and dynamic stacks leaked when scheduler is stopped #956

Closed cmorganBE closed 5 months ago

cmorganBE commented 6 months ago

When the scheduler is stopped, TCBs and dynamic stacks of running threads are not being freed. While ports, such as the posix port, clean up their resources, it appears as if the kernel itself is not freeing these resources when the scheduler is stopped.

TCBs are leaked from here, tasks.c: 1670

                /* Allocate space for the TCB. */
                /* MISRA Ref 11.5.1 [Malloc memory assignment] */
                /* More details at: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/MISRA.md#rule-115 */
                /* coverity[misra_c_2012_rule_11_5_violation] */
                pxNewTCB = ( TCB_t * ) pvPortMalloc( sizeof( TCB_t ) );  <----------------------------

Stacks are leaked from here, tasks.c:1662

            /* MISRA Ref 11.5.1 [Malloc memory assignment] */
            /* More details at: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/MISRA.md#rule-115 */
            /* coverity[misra_c_2012_rule_11_5_violation] */
            pxStack = pvPortMallocStack( ( ( ( size_t ) usStackDepth ) * sizeof( StackType_t ) ) );

            if( pxStack != NULL )
            {

@chinglee-iot with the merge of the posix changes I'm making use of them here for some tests. I'm not sure how it was missed before but it looks like we may have missed leaking task TCBs and dynamic task stacks.

A task's tcb appears to be freed in prvDeleteTCB() but its unclear how the posix port changes of cancelling and deleting threads accomplish this.

Should vTaskStartScheduler() delete the TCBs and dynamic stacks for any running threads before returning? It feels like the port shouldn't really be doing this since TCBs/dynamic stacks are created and managed by FreeRTOS kernel proper.

Valgrind trace that got me started:

==1415351== HEAP SUMMARY:
==1415351==     in use at exit: 37,088 bytes in 4 blocks
==1415351==   total heap usage: 2,506 allocs, 2,502 frees, 491,264 bytes allocated
==1415351== 
==1415351== 160 bytes in 1 blocks are possibly lost in loss record 2 of 4
==1415351==    at 0x4845828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1415351==    by 0x1F0BB6: pvPortMalloc (heap_3.c:65) <---------------- this is a TCB
==1415351==    by 0x1EB38D: prvCreateTask (tasks.c:1670)
==1415351==    by 0x1EB460: xTaskCreate (tasks.c:1722)
==1415351==    by 0x1E6405: WaterTemperatureSim::Enable() (WaterTemperatureSim.h:51)
==1415351==    by 0x1E59D1: WaterTemperatureSimTest(void*) (test_WaterTemperatureSim.cpp:23)
==1415351==    by 0x1F1442: prvWaitForStart (port.c:507)
==1415351==    by 0x4C8CAD9: start_thread (pthread_create.c:444)
==1415351==    by 0x4D1D2E3: clone (clone.S:100)
==1415351== 
==1415351== 4,000 bytes in 1 blocks are possibly lost in loss record 3 of 4
==1415351==    at 0x4845828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1415351==    by 0x1F0BB6: pvPortMalloc (heap_3.c:65) <---------------- this is a task dynamic stack
==1415351==    by 0x1EB378: prvCreateTask (tasks.c:1662)
==1415351==    by 0x1EB460: xTaskCreate (tasks.c:1722)
==1415351==    by 0x1E6405: WaterTemperatureSim::Enable() (WaterTemperatureSim.h:51)
==1415351==    by 0x1E59D1: WaterTemperatureSimTest(void*) (test_WaterTemperatureSim.cpp:23)
==1415351==    by 0x1F1442: prvWaitForStart (port.c:507)
==1415351==    by 0x4C8CAD9: start_thread (pthread_create.c:444)
==1415351==    by 0x4D1D2E3: clone (clone.S:100)
chinglee-iot commented 6 months ago

@cmorganBE Thank you for reporting back. I will take a look at this issue.

cmorganBE commented 6 months ago

@chinglee-iot and to be clear, the call to pvPortMallocStack() is also leaking, this one:

tasks.c:1662

            /* MISRA Ref 11.5.1 [Malloc memory assignment] */
            /* More details at: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/MISRA.md#rule-115 */
            /* coverity[misra_c_2012_rule_11_5_violation] */
            pxStack = pvPortMallocStack( ( ( ( size_t ) usStackDepth ) * sizeof( StackType_t ) ) );

            if( pxStack != NULL )
            {
cmorganBE commented 5 months ago

@chinglee-iot did you want me to look at this one? Would deleting TCBs and stacks when the scheduler is exiting be an acceptable change that could be viable for a merge?

Skptak commented 5 months ago

Hey @cmorganBE, I think that the proposed idea of deleting all tasks when stopping the scheduler is a change that should be discussed a bit more?

I'm thinking about a use case where somebody wants to stop the scheduler, create and delete a few tasks perhaps, and then restart it

Would you mind making a post on the FreeRTOS Forum about this so we can discuss it more there?

chinglee-iot commented 5 months ago

@cmorganBE

I've been blocked by other stuff and now get time to look into this problem.

The callstack you shared indicates that memory dynamically allocated for task are not freed, include TCB_t and stack. Allowing to delete those tasks after scheduler is stopped is one way to free those memory. We are looking into the implementation of vTaskDelete() to see if any change required.

The other problem is that we should call vTaskDelete() to delete all the tasks, including created by application and kernel. kernel created task handles can be obtained with xTaskGetIdleTaskHandle() and xTimerGetTimerDaemonTaskHandle(). That means application should keep all the task handles and delete them after scheduler stopped or we need another method to delete those tasks.

To free memory after scheduler stopped is not supported before. It is more like a new feature to FreeRTOS. As @Skptak suggested, we can also discuss it more in the forum.

cmorganBE commented 5 months ago

@Skptak @chinglee-iot

@chinglee-iot imo its inconsistent if we require the user of FreeRTOS To free these resources as the POSIX port already frees the internal ones and the official FreeRTOS docs say the kernel will free the rest, https://www.freertos.org/a00133.html.

Recent changes for the POSIX port perform cleanup when the scheduler is stopped so we've already freed the pthread threads, its the FreeRTOS resources that are leaking. I would be fine with either approach:

A. Scheduler is stopped and all resources are cleaned up (today POSIX port does this but FreeRTOS proper does not) B. Scheduler can be stopped/started and there is a separate way to clean up resources. This approach has to involve some interface for the port as there are port specific resources, unrelated to FreeRTOS tasks, that need to be cleaned up.

At the moment we are close to having 'A' completed if FreeRTOS freed TCBs and stacks on scheduler stop.

Also, 'A' is what the official docs say the behavior should be, https://www.freertos.org/a00133.html. Although this is only for the x86 port.

Again, I'm ok with either A or B but I worry we'll get stuck discussing it when we could fairly complete 'A' and let someone else interested in B discuss and propose how to do that, since it will need new API in the kernel proper.

Skptak commented 5 months ago

@cmorganBE

Again, I'm ok with either A or B but I worry we'll get stuck discussing it when we could fairly complete 'A' and let someone else interested in B discuss and propose how to do that, since it will need new API in the kernel proper.

That's a very good point, in my opinion. No need to let "perfect" be the enemy of good here. As you've pointed out, we can always add to this feature once it is implemented.

I think for now making it so that vTaskEndScheduler() implements a loop that deletes all FreeRTOS Tasks on the system, and frees any that are dynamically allocated, is a good place to start with this.

@chinglee-iot @aggarg what are your thoughts on this?

Skptak commented 5 months ago

I wonder if clearing out the TCBs and stacks of statically allocated tasks would also provide a somewhat generic method of "deleting" those tasks as well?

chinglee-iot commented 5 months ago

I think for now making it so that vTaskEndScheduler() implements a loop that deletes all FreeRTOS Tasks on the system, and frees any that are dynamically allocated, is a good place to start with this.

If this is for testing with simulator purpose, we can do this in vTaskEndScheduler(). Tasks created by application should still be deleted in application if not required anymore. Otherwise, it will still cause memory leakage. vTaskEndScheduler() should only delete those long running tasks which won't be deleted in real target.

vTaskDelete() can be used to delete task after scheduler stopped. Statically allocated task can also be deleted with this function. The task calling vTaskEndScheduler() which is pointed by pxCurrentTCB needs to be handled specially in the port.

cmorganBE commented 5 months ago

I think for now making it so that vTaskEndScheduler() implements a loop that deletes all FreeRTOS Tasks on the system, and frees any that are dynamically allocated, is a good place to start with this.

If this is for testing with simulator purpose, we can do this in vTaskEndScheduler(). Tasks created by application should still be deleted in application if not required anymore. Otherwise, it will still cause memory leakage. vTaskEndScheduler() should only delete those long running tasks which won't be deleted in real target.

@chinglee-iot it does sound like these are two different things. If https://www.freertos.org/a00133.html is followed and vTaskEndScheduler() deletes all tasks and cleans up memory then it will both be closer to the documentation and happens to resolve the leak when using the POSIX simulator.

It sounds like there may be reasons to end the scheduler in a non-POSIX port context but even in that case we'd be better aligning the kenel with the documented behavior of vTaskEndScheduler() in https://www.freertos.org/a00133.html.

aggarg commented 5 months ago

There are 2 type of kernel objects -

  1. Kernel Objects created by the application - This includes tasks, queues, semaphores etc. created by the application.
  2. Kernel Objects created by the kernel - This includes idle and timer task.

In order to keep the resource ownership clear, I think the application should delete the kernel objects that it creates and the kernel should delete the ones it creates. This probably means that we need to update our documentation as well.

@cmorganBE Would it solve your problem? With this, you can delete all the kernel objects (including tasks) in your test harness before calling the vTaskEndScheduler() and valgrind should report no memory leak.

cmorganBE commented 5 months ago

@aggarg this could work. Is there a way for the application to get the running tasks from freertos so this could be done globally vs. having to track it application side?

I've been messing with how to free things from the kernel side and unfortunately one complication is that the scheduler must be stopped from within a FreeRTOS task. This means if you were to delete TCBs and stacks from within the vTaskEndScheduler() call from within a FreeRTOS task you'd end up deleting the TCBs and stacks from the thread itself.

Perhaps the whole idea didn't make sense vs. the application freeing resources.

@aggarg one more question, how does one ensure that a task has been deleted by FreeRTOS? vTaskDelete() defers the deletion and there is no notion in FreeRTOS for a pthread_join() that I'm aware of.

cmorganBE commented 5 months ago

@aggarg yeah tested here and that's still a gap. Even if one were to have their threads vTaskDelete() themselves, I think you'd need a mechanism to ensure that the resources were freed. Unless the idea is to vTaskDelete() them from the function requesting a task shut down, vs a more graceful shutdown where the task itself calls vTaskDelete()?

This example leaks the tcb and stack of 'a_task':

void a_task() {
  while(running)
  {
    xxx;
  }

  vTaskDelete(NULL);
}

void shutdown()
{
    running = false;
    vTaskEndScheduler();
}
chinglee-iot commented 5 months ago

@cmorganBE

You point out some problems. vTaskDelete() and vTaskEndScheduler() also need to be updated for the problems.

This means if you were to delete TCBs and stacks from within the vTaskEndScheduler() call from within a FreeRTOS task you'd end up deleting the TCBs and stacks from the thread itself.

This is true. If the last running task, which is the task calling vTaskEndScheduler(), is deleted after scheduler stopped, we can get rid of this dilemma. Because task resources (such as task stacks) will no longer be used, the task can be safely deleted.

The following jobs need to be done in vPortEndScheduler() to delete the last running task:

Application can call vTaskDelete to delete the last running task after vTaskEndScheduler() is called.

how does one ensure that a task has been deleted by FreeRTOS? vTaskDelete() defers the deletion and there is no notion in FreeRTOS for a pthread_join() that I'm aware of.

Idle task takes care of the deferred deletion. There is chance that idle task doesn't have the chance to run before vTaskEndScheduler() is called. Therefore, the deferred deletion should also be done in vTaskEndScheduler().

I am creating a PR to address these problems. It still take some time check a couple of tests. Once it is ready, I will update in this thread again to discuss the solution here.

cmorganBE commented 5 months ago

@chinglee-iot the processing of deferred deletion during vTaskEndScheduler looks like it will resolve the remaining leaks. Feel free to mention me on any PR and I can test here.

chinglee-iot commented 5 months ago

PR #962 is created to address the problem discussed in this thread. This PR has the following updates:

@cmorganBE

After delete the application created tasks in the test application, I no longer see memory leakage in valgrind report with the test application you shared before. The test cases is updated in this branch.

Can you also take a look at this PR to see if this could solve the problem in this thread? Any comments will be appreciated.

chinglee-iot commented 5 months ago

@cmorganBE The PR #962 to address this issue is merged. We will close this issue.

vTaskDelete() can be used to delete tasks after scheduler is stopped. We can consider the following example to delete application created tasks if gracefully delete tasks is not required in the application:

/*-----------------------------------------------------------*/
/* Application code. */
/*-----------------------------------------------------------*/
void vSubTaskX( void *pvParams )
{
    for( ;; )
    {
        /* TaskX code. */
    }
}

void vSubTaskY( void *pvParams )
{
    for( ;; )
    {
        /* TaskY code. */
    }
}

void app_main( void )
{
    /* Create the application sub task. */

    /* Start the scheduler. */
    vTaskStartScheduler();
}

/*-----------------------------------------------------------*/
/* Test harness code. */
/*-----------------------------------------------------------*/

void vManagementTask( void *pvParams )
{
    /* End scheduler after test complete. */
    vTaskEndScheduler();

    /* The task code here should not be run. */
    configASSERT( pdFASLE );
}

int main( void )
{
    /* Creating management task. Management task should not be interrupted when 
     * it stops the scheduler. */
    xTaskCreate( vManagementTask, ... );

    /* Calling the application main function. */
    app_main();

    /* Assuming the application continue to execute from here after
     * vTaskEndScheduler is called . */

    /* Scheduler keeps the state after scheduler is stopped. Application can delete all
     * the tasks here. In this example, we can delete the management task, SubTaskX 
     * and SubTaskY here. */
    vTaskDelete( management task );
    vTaskDelete( Sub task X );
    vTaskDelete( Sub task Y );
}

Thank you again for reporting this problem and discuss with us about the solution.