Stichting-MINIX-Research-Foundation / minix

Official MINIX sources - Automatically replicated from gerrit.minix3.org
Other
2.99k stars 969 forks source link

Possible bug in the scheduling algorithm #318

Open przybyszewskiw opened 3 years ago

przybyszewskiw commented 3 years ago

I was analyzing Minix code responsible for scheduling. In minix/kernel/proc.h there is a definition of RTS_UNSET macro which "clears flag and enqueues if the process was not runnable but is now". I wonder why this macro always uses enqueue function. I think it should use enqueue_head in case when (rp)->p_cpu_time_left is positive. Otherwise, when a process is blocked on I/O before using its whole cpu time, it goes to the very end of its queue once it is runnable again.

stux2000 commented 3 years ago

Notes for future research:

Personally, I don't know if there's a technical reason why the process would be sent to the back of the queue instead of the front. I'd imagine that in some cases this could lead to process starvation. However, nothing in the comments indicate that using enqueue instead of enqueue_head was intended. I don't know if there's code from other OSes that can be referenced, otherwise this might be a good update to make.

Another thing I'm thinking: if this is a common pattern/issue. It might be best to create a new function and/or macro that tests to see if the process still has time and calls enqueue_head instead of enqueue. Even if it's only used twice it might be a good abstraction to create.

stux2000 commented 3 years ago

I'd also like to note (as it was brought to my attention) that any such change can only be made after thorough testing and proper understanding of the process scheduler is acquired. Otherwise, other subtle performance bugs might arise from said fix.

przybyszewskiw commented 3 years ago

An example of a starvation I observed is when a completely new process is stopped due to a pagefault. After the pagefault is handled, the process isn't started immediately. Therefore, chances are that other processes on the same priority will eventually remove that page from cache and the process will encounter exactly the same pagefault over and over again. If there's a need, I can prepare an example, which can be reproduced.