SerenityOS / serenity

The Serenity Operating System 🐞
https://serenityos.org
BSD 2-Clause "Simplified" License
30.38k stars 3.18k forks source link

Kernel: Assert fails on nested blocking signal handlers. #433

Closed DrewStratford closed 5 years ago

DrewStratford commented 5 years ago

To reproduce, run this program

#include <unistd.h>
#include <signal.h>
#include <stdio.h>

int main(void){
    signal(SIGTSTP, [](int){
        printf("in SIGTSTP\n");
        printf("out of SIGTSTP\n");
    });

    signal(SIGCONT, [](int){
        printf("in SIGCONT\n");
        int rc = sleep(1000);
        printf("returning from SIGCONT: %d\n", rc);
    });

    int r = sleep(1000);
    printf("out of sighandler: %d\n", r);
    return 0;
}

and then:

kill -18 pid
kill -20 pid

which should result in something like this

sys$sigreturn failed in example2(13)
ASSERTION FAILED: false
Process.cpp:756 in void Process::sys$sigreturn()
0x00019ba3  __assertion_failed(char const*, char const*, unsigned int, char const*) +31
0x0001a894  Process::from_pid(int) +0
0x00023db2  syscall_trap_entry +1468
0x000237ac  syscall_trap_handler +38
0x0002d641  Scheduler::context_switch(Thread&) +205
DrewStratford commented 5 years ago

I've done a bit of digging and I think this may be the culprit: https://github.com/SerenityOS/serenity/blob/83fdad25edfec1c8668afdff1268129e5181d78a/Kernel/Thread.cpp#L402

This doesn't seem to update, so subsequent signals from the kernel would be given the same stack pointer.

awesomekling commented 5 years ago

Yeah, that's probably the cause of the issue. What do you think would be a good solution?

Here's one idea: Maybe we could have something like Thread::m_signal_nesting_level that gets incremented for every signal handler we enter, and decremented when we return from that handler. If the nesting level is non-zero, we don't overwrite esp but keep the existing stack.

DrewStratford commented 5 years ago

After yet more looking, I've confirmed that the signal handlers are being given the same stack. However, this isn't the only issue; when we set m_tss_to_resume_kernel we overwrite the existing data which causes problems.

@awesomekling as for the stack pointer issue: your method would work, but I think the nicest option would be using the user space stack.

awesomekling commented 5 years ago

Okay, are you planning to work on this? If you can make the user space stack work, then that would be a nice option. :)

DrewStratford commented 5 years ago

Yeah, I'll give it a go.

DrewStratford commented 5 years ago

I had a go, but couldn't quite get something that didn't introduce subtle bugs. Someone else is free to fix this if they want.