RT-Thread / rt-thread

RT-Thread is an open source IoT Real-Time Operating System (RTOS).
https://www.rt-thread.io
Apache License 2.0
10.3k stars 4.97k forks source link

SMP: threads get reinserted before their TCB gets saved #7988

Open vjach opened 1 year ago

vjach commented 1 year ago

While trying to use RT-Thread with rp2040 I found in SMP configuration it can happen that current_thread gets inserted back into the ready thread list before context switch can occur. Let's say that happens on core 0. Then core 1 can already pick up that thread before PendSV gets to run on core0 where the TCB should be saved. Now core1 could do a context switch to a thread that has not yet its TCB saved on its stack leading to memory corruption. Could anyone please confirm that? While I am new to RT-Thread I could start working on a PR once it is confirmed by others. In my testing I added a MAGIC to the TCB and I could spot situations where whire switching to threads (in PendSV) MAGIC was missing from TCB. I also ran tests with threads, all binded to a core or another and there there is no surprise the issue does not show up: core 1 cannot pickup on incomplete TCBs from core 0 and vice versa.

Any feedback would be appreciated! Victor

BernardXiong commented 1 year ago

Thank you for your feedback.

Could you please provide a testcase? Normally, there is a lock for multi-core. When a core take the resources, it should be take this lock (which is the rt_hw_interrupt_disable in UP).

vjach commented 1 year ago

The lock is there for SMP too, no doubt about that. What I mean in my previous post is that a thread is put to the list of available threads before PendSV can even save the its state. See here and here. In my opinion, the context switch should happen first and then the previous thread should be inserted in the list of threads to be scheduled again and not the other way around. It is expected that any thread ready to be scheduled by any core has its TCB saved on the stack already.

BernardXiong commented 1 year ago

Thank you for your feedback. Oh, it was SMP under cortex-m. I need to carefully confirm this part, as I have not used SMP of cortex-m before and have only used it on MPU processors such as Cortex-A.