adrienverge / openfortivpn

Client for PPP+TLS VPN tunnel services
GNU General Public License v3.0
2.7k stars 320 forks source link

GCD semaphores might be unsafe when used in signal handlers #105

Closed henninglive closed 7 years ago

henninglive commented 7 years ago

Openfortivpn currently uses GCD (Grand Central Dispatch) semaphores on OS-X. This is done to work around the fact the that OS-X doesn’t support unnamed posix semaphores. I ran into the same problem while working on another project and while looking for workarounds I found multiple solutions suggesting substituting posix semaphores with GCD semaphores, but I was not able to find any documentation specifying if GCD semaphores are async-signal safe.

I did some further digging and found the source for GCD. As you can see here and here, GCD uses mach/semaphore.h internally. Unfortunately, the documentation for mach/semaphore.h suggest that they are not async-signal safe.

Semaphores can be used any place where mutexes can occur. This precludes their use in interrupt handlers or within the context of the scheduler, and makes it strongly discouraged in the VM system.

If this is correct, it is properly a bad idea to use GCD semaphores. The currently solution might work most of the time, but specific timing might cause it to fail. Here is more info on Async-Signal Safety.

Here is another discussion about the same problem in OpenJDK. They suggest using BSD kqueue/kevent.

In the other project I was working on we ended up using pipes, similar to this. This should work on all posix compliant systems, unlike unnamed semaphores which are an optional part of the posix standard.

DimitriPapadopoulos commented 7 years ago

Indeed openfortivpn uses GCD semaphores:

#ifdef __APPLE__
/* Mac OS X defines sem_init but actually does not implement them */
#include <dispatch/dispatch.h>

typedef dispatch_semaphore_t    sem_t;

#define sem_init(psem,x,val)    *psem = dispatch_semaphore_create(val)
#define sem_post(psem)      dispatch_semaphore_signal(*psem)
#define sem_wait(psem)      dispatch_semaphore_wait(*psem, \
                    DISPATCH_TIME_FOREVER)

#define sem_destroy(psem)   dispatch_release(*psem)
#endif

And indeed using GCD semaphore in signal handlers is documented to be unsafe. I'm surprised there aren't more references in Google about that. Impressive investigation by the way! :thumbsup:

I haven't (ever?) used semaphores recently so perhaps you can help here:

henninglive commented 7 years ago

I am not an expert on this, I just stumbled on the problem while trying to find a workaround for missing unnamed semaphores on OSX. I saw people suggesting GCD semaphores as an alternative and while researching if they were safe, I found your project. I have never actually tried openfortivpn, but here is my insight anyway.

mrbaseman commented 7 years ago

I have found this about Mach semaphores and I think the current approach in #135 looks promising. Do you agree?

As an ingredient for the more general discussion, what about unnamed POSIX semaphores if they are initialized by a global variable?

henninglive commented 7 years ago

Ok, so POSIX named semaphores are actually implemented on top of Mach semaphores. Weird that the documentation suggests they are not async signal safe, they might be safe, but I don’t know if I would take the chance. I would be very careful using anything to not explicitly guaranteeing async signal safety, there is so much that can go wrong with signal handlers. You could go ask on mailing list or something, if they are safe or not.

Without someone confirming they are safe, I think using a pipe is best idea. It would work with pthread_cancel() and the possible downsides(performance and unnecessary use of extra file descriptors) shouldn’t be a concern for you.

What do you mean by initialized by a global variable, are you talking about unnamed semaphores in memory shared between processes or are you talking about static initializer like PTHREAD_MUTEX_INITIALIZER. I don’t think that exists for semaphores.

mrbaseman commented 7 years ago

I was referring to Unnamed semaphores in memory shared between threads

A thread-shared semaphore is placed in an area of memory shared between the threads of a process, for example, a global variable.

henninglive commented 7 years ago

Isn't that the normal use case, synchronizing different threads in the same prosses?

mrbaseman commented 7 years ago

Probably yes, at least that is how they are used in openfortivpn

mrbaseman commented 7 years ago

We are still left with the question if semaphores on MacOS X are async-signal-safe. I have been further studying the various variants and I have found an interesting handout.

A semaphore is somewhat like an integer variable, but is special in that its operations (increment and decrement) are guaranteed to be atomic—you cannot be halfway through incrementing the semaphore and be interrupted and waylaid by another thread trying to do the same thing.

135 in its current version would switch from GCD semaphores to Mach semaphores. (named) POSIX semaphores seem to be implemented upon Mach semaphores but we don't know if extra synchronization efforts are involved. Anyhow, as @henninglive initially has pointed out, the documentation states

Semaphores can be used any place where mutexes can occur. This precludes their use in interrupt handlers...

When looking for more detailed information about this I came across this thread saying that the problem could be deadlocks when a mutex is used in a signal handler. Although this is not exactly what we have in our case, let's look how semaphores are used in openfortivpn:

@henninglive by incidence I have found your thread here, so I'm linking this in case some new insight will be posted there by someone.

henninglive commented 7 years ago

Here is the userspace source code for mach semaphores, straight syscalls. Here and Here is the kernel source code for semaphore_signal(). sem_post() is also a straight syscall and Here is the kernel source code. We know sem_post() is async signal safe on named semaphores. There is lock here, might be relevant. Nevermind, it is unlocked before the call to semaphore_signal().

Based on this, I can’t see how semaphore_signal() could be unsafe while sem_post() being safe.

mrbaseman commented 7 years ago

@henninglive thank you very much for this analysis and for the references that you have provided. I have tried to do an analysis of GCD semaphores, too. The documentation sais

A dispatch semaphore works like a regular semaphore with one exception. When resources are available, it takes less time to acquire a dispatch semaphore than it does to acquire a traditional system semaphore. This is because Grand Central Dispatch does not call down into the kernel for this particular case. The only time it calls down into the kernel is when the resource is not available and the system needs to park your thread until the semaphore is signaled.

However, I can't draw a similar comparison between the implementation and the one of POSIX named semaphores. So, given the fact that POSIX named semaphores are actually implemented on top of Mach semaphores, moving over to Mach semaphores probably is a good step forward.

@adrienverge and @DimitriPapadopoulos I'm now convinced that we should accept #135. It not only solves the locking issue that @Mabin-J has observed, but also addresses the possible issue with GCD semaphores discussed here. Do you still have any objections? Perhaps a very short summary of this profound discussion here in #105 should be included in the commit message, and perhaps we can prepare a wording here that we can then propose to @Mabin-J

adrienverge commented 7 years ago

@mrbaseman Looks good to me! :+1:

DimitriPapadopoulos commented 7 years ago

@mrbaseman Looks good to me too. Although I don't have time to test on Linux and cannot test on Mac, the whole discussion about different types of semaphores and their limitations is thorough and moving from GCD to Mach semaphores seems to make sense based on the available information.

mrbaseman commented 7 years ago

Ok, I have just tested it successfully and asked for an update of the commit message in https://github.com/adrienverge/openfortivpn/pull/135#issuecomment-307048969

mrbaseman commented 7 years ago

I have just merged #135, so I think we can also close this issue (see discussion above)

krackers commented 8 months ago

Fwiw there is the obscure and deprecated https://developer.apple.com/documentation/coreservices/1585713-mpsignalsemaphore?language=objc which explicitly says

Note that you can call this function from an interrupt handler.

And if you look at its disassembly, it just ends up calling semaphore_signal. So this should essentially confirm it.

Also the kernel equivalent https://raw.githack.com/apple-oss-distributions/xnu/main/osfmk/man/semaphore_signal.html states

Device driver interrupt service routines may safely execute semaphore_signal operations without causing a deadlock.

krackers commented 8 months ago

Although these MultiProcessing semaphore functions are interesting... unlike traditional mach semaphore they also allow you to specify a maximum. But if you look at the implementation, it's basically something like


undefined8 sym._MPSignalSemaphore(int64_t param_1)
{
    uint32_t uVar1;
    undefined8 ret;
    int64_t var_ch;

    ret = sym._RetrieveDataFromOpaqueID(param_1, (int64_t)&var_ch, (int64_t)&var_ch + 4);
    if (((int32_t)ret == 0) && (ret = 0xffff8d8d, (int32_t)var_ch == 0x73656d61)) {
        ret = 0xffff8d8e; // kInsufficientResourcesErr
        if (*(uint64_t *)(var_ch->curValue) < *(uint64_t *)(var_ch->maxValue)) {
            sym.imp.OSAtomicAdd64Barrier(1, var_ch->curValue);
            uVar1 = sym.imp.semaphore_signal();
            ret = sym._kernelErrorToOSStatus((uint64_t)uVar1);
        }
    }
    return ret;
}

In particular, the maximum tracking is just implemented by having an associated atomic counter. But this seems like an obvious race, they should have used an atomic CAS instruction instead of doing the compare then the increment.... maybe that's why these functions are deprecated?

See also https://old.reddit.com/r/osx/comments/6exsqd/mach_semaphores_in_posix_signal_handlers/


There also remains the question of whether GCD semaphores are signal-handler safe, given that they're implemented on top of mach semaphores. Empirically, the answer seems to be no. In addition to the OpenJ9 issues linked above, Golang also tried experimenting with GCD Semaphores and saw crashes with signal handlers https://go-review.googlesource.com/c/go/+/182880 (they ultimately decided to use a pipe).

I think this is one possible way that the SIGILL is generated:

1) Some thread calls dispatch_sem_wait, and the count goes from 0 to -1. So it tries to create and wait on the semaphor. But before it can fully create the semaphor, it gets interrupted. 2) In the interrupt handler, dispatch signal is called. Now the count goes from -1 to 0, and dispatch signal knows it needs to try to wakeup a thread. But it sees that no semaphor exists, so it goes to create one. (This is a state that can only ever happen when the signal is called in between another thread's atomic decrement and wait). 3) The call to _dispatch_sema4_create is likely not re-entrant: not only is it not documented as such, there is also stuff about a semaphore cache which increases the complexity.

There is an interesting question of whether GCD semaphores that have already hit the slow-path once now become async-safe. I would think so, since then there will be no more semaphore creation and the logic is no different than a direct call to sem_signal on the slow path.

The other thing I am not sure about, is that GCD also verifies the return value of semaphore_signal through DISPATCH_SEMAPHORE_VERIFY_KR which has a call to DISPATCH_VERIFY_MIG which explicitly states that this can be returned if the function being called is not async-safe. There is not much info here, but given that sem_post also verifies the return code, I don't think this would be returned from waiting on a mach semaphore under usual cases.


Edit: Another interesting complication is that older versions of libdispatch had a note that mach semaphores might suffer from spurious wakeups, and they used a workaround to explicitly track count.

    // Mach semaphores appear to sometimes spuriously wake up. Therefore,
    // we keep a parallel count of the number of times a Mach semaphore is
    // signaled (6880961).

This note was removed in the 2016 version of libdispatch. But the kerne's sem_wait implementations do not employ any workaround, they directly call into semaphore_wait.

Looking at the semaphore_wait / semaphore_signal code xnu side, I see that in the versions up to ~2016

            /*
             * We have to check the set wait queue. If the set
             * supports pre-posting, it isn't already preposted,
             * and we didn't find a thread in the set, then mark it.
             *
             * If we later find a thread, there may be a spurious
             * pre-post here on this set.  The wait side has to check
             * for that either pre- or post-wait.

after 2016-ish that entire wait_queue.c was rewritten from the ground up into a waitq.c. I don't fully follow the logic, but I think that is good enough evidence that until 2016-ish mach_semaphores could indeed suffer from spurious wakeups.