Race between `openat` and `thread_kill`

jespiron commented 2 months ago

Describe the bug Suppose CPU core 0 is running Thread 0 and CPU core 1 is running Thread 1. T0 is doing an openat syscall, relevant line is line 508. T1 is running thread_kill on T0 (i.e. thrd==T0), relevant line is call to kfree

Possible races:

If CPU 1 finishes kfree before CPU 0 executes line 508, CPU 0 will access freed memory running_thread->mother_proc (since running_thread will be freed!)
If CPU 1 finishes kfree after CPU 0 executes line 508, there may be some non-deterministic behavior. Note that thread_kill destroying the TCB doesn't mean that the other thread will stop executing immediately! In the above case, CPU 0 will continue executing the openat syscall made by T0 even after T0 is destroyed.

To my understanding, the above scenario is possible because the kernel supports multiprocessing, as I see support for SMP. Even if in a uniprocessor environment, the threads could interleave to create the above scenario (looking at the syscall stubs, interrupts are enabled when syscalls are made).

Suggestion Instead of cleaning up the thread immediately upon thread_kill, mark some sort of flag that the thread should be killed. Then clean up the thread when it's safe to do so, likely on the next reschedule.

NSG650 commented 2 months ago

Hello there,

First off thank you so much for your report!!! I really appreciate it.

I think another possible fix for this issue could be forcing a reschedule on the cpu that is running the thread. What are your thoughts about this solution?

jespiron commented 2 months ago

Hey!! I appreciate your initiative in this project, it takes a lot of dedication to get it to this point 😄

As for forcing a reschedule: do you mean we force a reschedule on T0 before destroying it? This indeed prevents T0 from running again, since we remove T0 from the list, so the scheduler can't select it. We won't access freed memory, which is good. But in case T0 is running something like close, we do want T0 to run to completion, to clean up the fd we created in open and prevent a resource leak.

Basically, when T0 is in a syscall that's supposed to clean up something (freeing resources, unlocking something, etc.), how can we make sure it finishes its job?

NSG650 commented 2 months ago

Yeah your suggestion now makes a lot of sense taking this into mind as well. Now what if a thread locks a resource? Now we need to figure some way out if a thread is locking something and unlock when it gets killed.

jespiron commented 2 months ago

For sure, you can search "asynchronous thread cancellation" vs "deferred thread cancellation" for more. It sounds like I'm suggesting towards deferred cancellation, but it's not the only way, and definitely many directions you can take this.

NSG650 commented 2 months ago

I'm thinking of going async since I think it would help us find any locking issues. If a thread dies without unlocking then there must be something. I would like to know your opinion however since you seem to have more experience in this field and would like to learn more.

NSG650 commented 2 months ago

Hey there! I have committed the fix 7fe8bfb59e50477849ce8aaf4c8adb30536bd527. I hope this resolves the issue!

jespiron commented 2 months ago

Hello! Haha funny variable names :^)

The fix for this is more complicated than originally described. I believe the same issue is possible:

T0 runs on CPU 0, T1 runs on CPU 1.
T0 calls thread_kill on T1, which puts T1 on death row
CPU 0 reschedules, but we can't make assumptions about whether it's safe to kill T1, as T1 might not have released its resources yet.

At a high level, yes we'd want to defer until a safe point. I named reschedule as one example, though for multiprocessor environments there's more to take into account. These safe points depend on how the rest of the kernel is implemented:

When T0 calls thread_kill on T1, what might T1 be in the middle of doing?
Out of the things that T1 might be in the middle of doing, at what points can we be confident that T1 releases its resources and finishes whatever important business it was doing? This is where we'd kill T1 for real.

If T1 was in the middle of a syscall, a safe point can be before it IRET's back into user mode (depending on how the rest of the kernel is implemented, this might also not be the full story).

Whatever fix you come up with, if you can run through each scenario of T0 and T1 and convince yourself that T1 finishes safely, then it's a good enough fix

NSG650 commented 2 months ago

Hi! Sorry replying later, I had exams going. So I should probably kill the thread when it returns back to userspace since by then it would have cleaned up anything it was doing in the kernel space.

NSG650 commented 2 months ago

Hey there! I finally got time to work on this and tried to fixing it I hope this fixes it.

jespiron commented 2 months ago

Hey, awesome! The fix looks good, I don't have any more concerns atm. Closing this :)

NSG650 / Polaris

Race between `openat` and `thread_kill` #24