Semaphore/Keyboard/WaitQueue/etc.: Race condition between acquire() and release()

mkilgore commented 4 years ago

Hi! I just noticed this in passing, so I have not verified it by running the kernel (And it's kinda hard to make a test case for a race condition...), but I figured I would make an issue and mention it anyway.

The gist is that, in code like the Keyboard driver, you drop a spinlock protecting the data you're waiting for (the keyboard buffer, in this instance) before joining the wait queue. This creates a period of time where the buffer is not protected, but the caller will also not detect wakeups, resulting in a 'lost wakeup' situation. Imagine this ordering:

- keyboard::read() called
    - Acquires m_lock
    - Checks m_buffer.size(), it returns zero
    - Releases m_lock
    - A keyboard interrupt is received, enqueue_keystroke() is called:
        - Acquires m_lock
        - Calls m_buffer.queue()
        - Calls m_wait_queue.wake_up_all()
        - Releases m_lock
        - Keyboard interrupt ends, keyboard::read() resumes
    - Thread::current->wait_on(m_wait_queue) is called, current processor starts sleeping on the m_wait_queue

Note how, at the end of that chain of events, the current thread is still waiting on m_wait_queue, but m_buffer actually contains data and wake_up_all() has already been called. This means that the current data in m_buffer is not going to get processed until after the user presses another key to trigger enqueue_keystroke() to run again and do another wake-up. Hence, the first call to wake_up_all() was 'lost', and the keyboard::read() call is waiting for a wakeup that it's never going to receive.

There is more than one solution here, but the gist of the problem is simply that you cannot unilaterally drop the spinlock protecting m_buffer in that fashion, because once you drop that lock anything can happen to m_buffer (and the wait queue) before you have a chance to do anything else. This does, however, get a bit messy, since you have to drop the lock at some point because you obviously cannot be holding it while you go to sleep, so some kind of algorithm and/or ordering has to be used.

I personally like the Linux Kernel solution a bit, which takes a moment to wrap your head around but is actually pretty easy to utilize. You can read a fair amount about it here, and also a lot about the basic wait queue interface at this lwn article. For the article, I would pay particular attention to the prepare_to_wait() code example, which is not the easiest to understand at first but is basically exactly the approach you're looking for.

xv6 also has a solution to this problem, which is a bit different from how the Linux Kernel does it but is also a perfectly viable option. Note that they don't really have a 'wait-queue' structure, but their chan key basically serves as a wait queue identifier, So rather than maintain separate 'wait-queue' lists, they just loop over all the processes and find any with the correct chan value (Which is not optimal, but does get the job done).

I'd be happy to go into more detail on either of these approaches if you're interested, but I'm hoping that's enough information to start with if you're looking to solve this problem on your own.

Now, I'll add as an addendum that I've only been able to gather so much from reading the code. If the OS is only designed for a uni-processor system and the kernel code is not preemptable then not every usage of this pattern can actually result in a lost wakeup. I highlighted the keyboard example, however, because it's an unavoidable case and can happen even on a single-processor system - you won't have interrupts off when calling keyboard::read() and the keyboard interrupt can come at any time. As I noted in the title, this kind of pattern/issue also exists in other parts of the code like the Semaphore logic, but unless they are actually used by an interrupt handler they are potentially not actually bugged (Assuming your kernel code is not preemptable).

AymenSekhri commented 4 years ago

Thank you for pointing this out. . There are some kernels who dealt with this problem by adding blocked threads to a global "blocked queue" and each one should provide a function for checking if the thread should wake up or not (like can_read, can_write... ), and the scheduler will go through all blocked threads and wake up the threads that are ready to be woken up. This approach is a little expensive since you need to iterate through all blocked threads and check them each clock interrupt. I will read the resources you provided, and think of a way to solve this problem.

mkilgore commented 4 years ago

I have seen those kernels, but as you noted it's a pretty expensive solution and I would really caution against going that route. You're already basically there in terms of doing "real" process management (vs. calling a polling function on every clock tick), to fix this problem you just need to modify how you do the wait-queue logic a bit so you don't have that lost-wakeup problem. But otherwise what you're doing (lock m_buffer, sleep on wait-queue, interrupt wakes queue) is very good stuff and there's no reason to get rid of it. I think I'd even argue it'll be less work to fix your wait-queue logic than to replace it all with pulling functions.

The xv6 book also has a section called 'sleep and wakeup' under the 'scheduling' section I should have linked - it goes over this exact problem and how their sleep() function works to avoid it.

AymenSekhri commented 4 years ago

I guess i fixed the lost wake up problem now. wait_on_event now takes another parameter which is used to check if the condition is satisfied before going to wait queue. And here is how Keyboard::read uses it. I just realized that using lock here is useless lol. I guess I will move the locking and the waiting of read/write.. to a higher level, FileDescription class. Because I will have to write the locking code only once and clean the actual device drivers from potential bugs/race conditions.

mkilgore commented 4 years ago

Yep, that looks to solve it - by holding the wait-queue lock the entire time from checking the condition to blocking the current thread, you can't lose any wake-ups.

I will note, however, that I would argue the lock you take in keyboard::read() actually is still necessary, though it does get a bit hairy. The lock that protects the m_buffer from concurrent accesses is a different one from the one that protects the wait-queue (They're both called m_lock, but they belong to different objects and protect different things). The catch is, of course, whether you can call .size() on your CircularBuffer without taking the lock protecting it, which you probably actually can since all it does is read m_size, so 🤷 The assignment to m_size in the CircularBuffer technically isn't guaranteed to be atomic the way you wrote it though, which could potentially lead to some weird cases.

That said, I would suggest a small improvement to your wait_on_event() method, if you don't mind, which would also bring it similar to the Linux variant (And also what I use) - rather than needing to take a lock in your checker() function, allow optionally passing in an already acquired lock and then drop/acquire it when you sleep. If you do that, you can actually embed the entire looping logic into wait_on_event() and allow your code to simply call wait_on_event() without having to check the condition or do a loop themselves. Something like this (Forgive me if I screwed up the C++):

// You might want to retain your version that doesn't take a lock as well
// This is called and returns with checker_lock acquired
template <typename T> void wait_on_event(T checker, ScopedLock &checker_lock)
{
    ScopedLock local_lock(m_lock);
    while (checker()) {
        Thread::current->block();
        m_threads.push_back(*Thread::current);

        // Automatically drop and acquire both locks, so checker() doesn't need to do it
        local_lock.release();
        checker_lock.release();

        Thread::yield();

        // The order here is important to avoid deadlocks
        checker_lock.acquire();
        local_lock.acquire();
    }
}

void keyboard::read(/* blah */)
{
    ScopedLock local_lock(m_lock);

    m_wait_queue.wait_on_event([&]() { return m_buffer.size() < size }, local_lock);

    // At this point, local_lock is acquired *and* m_buffer.size() >= size
    // So no loop is necessary, as the condition is guaranteed to be true
    // and we hold the lock so it can't change

    /* logic */
}

void Semaphore::acquire()
{
    ScopedLock local_lock(m_lock);

    m_queue.wait_on_event([&]() { return m_count < m_max_count; }, local_lock);

    // At this point the lock is acquired *and* m_count < m_max_count
    m_count++;
}

And as I noted in a comment in that code, the lock acquire()/release() ordering is important here since you're dealing with two locks. In your version, you always nest the keyboard/semaphore/etc. lock inside of the wait-queue lock, in the above code it's the other way around, so the wait-queue lock nests inside the keyboard/semaphore/etc. locks. The distinction doesn't really matter as long as it is kept consistent (and you do), but inverting the order allows you to not have to drop the keyboard/semaphore/etc. lock before using the wait-queue.

AymenSekhri commented 4 years ago

The above version of wait_on_event is way cleaner indeed. I just modified the code, here is wait_on_event and here is Keyboard::read. Now there are two versions for queuing the current thread : wait_on_queue and wait. wait_on_queue takes a template function to check if condition is met, and a lock to be acquired and released before sleeping. wait in other hand is unconditioned waiting so it doesn't need a checker function nor a lock. With that said, wait_event_interruptible of Linux guarantees that there will be no lost wake ups but I think it still needs a loop if multiple threads woke up, they still need to obtain the lock and check the condition again so only one thread is actually executing the next code. But with the above code, it's not needed since it's all embedded in wait_on_event which is pretty cool.

mkilgore commented 4 years ago

Good stuff!

I don't know if you actually delved deeper into how wait_event_interruptable is implemented (IIRC it's pretty painful to read...), but it actually does the loop itself and works like your wait_on_event does. The dirty catch about it is that it's just a macro, so it expands into a loop and embeds the condition check (which is passed as an argument to the macro) inside the loop. So it's effectively a template without some of the niceties you get with C++ (Which admittedly does make me want to use C++ at times...).

AymenSekhri commented 4 years ago

Yeah, going down through macros is painful, I gave up halfway lol. Thank you for your guidance, and have a nice day.

AymenSekhri / CyanOS

Semaphore/Keyboard/WaitQueue/etc.: Race condition between acquire() and release() #5