Closed henninglive closed 7 years ago
Indeed openfortivpn uses GCD semaphores:
#ifdef __APPLE__
/* Mac OS X defines sem_init but actually does not implement them */
#include <dispatch/dispatch.h>
typedef dispatch_semaphore_t sem_t;
#define sem_init(psem,x,val) *psem = dispatch_semaphore_create(val)
#define sem_post(psem) dispatch_semaphore_signal(*psem)
#define sem_wait(psem) dispatch_semaphore_wait(*psem, \
DISPATCH_TIME_FOREVER)
#define sem_destroy(psem) dispatch_release(*psem)
#endif
And indeed using GCD semaphore in signal handlers is documented to be unsafe. I'm surprised there aren't more references in Google about that. Impressive investigation by the way! :thumbsup:
I haven't (ever?) used semaphores recently so perhaps you can help here:
I am not an expert on this, I just stumbled on the problem while trying to find a workaround for missing unnamed semaphores on OSX. I saw people suggesting GCD semaphores as an alternative and while researching if they were safe, I found your project. I have never actually tried openfortivpn, but here is my insight anyway.
I have found this about Mach semaphores and I think the current approach in #135 looks promising. Do you agree?
As an ingredient for the more general discussion, what about unnamed POSIX semaphores if they are initialized by a global variable?
Ok, so POSIX named semaphores are actually implemented on top of Mach semaphores. Weird that the documentation suggests they are not async signal safe, they might be safe, but I don’t know if I would take the chance. I would be very careful using anything to not explicitly guaranteeing async signal safety, there is so much that can go wrong with signal handlers. You could go ask on mailing list or something, if they are safe or not.
Without someone confirming they are safe, I think using a pipe is best idea. It would work with pthread_cancel() and the possible downsides(performance and unnecessary use of extra file descriptors) shouldn’t be a concern for you.
What do you mean by initialized by a global variable, are you talking about unnamed semaphores in memory shared between processes or are you talking about static initializer like PTHREAD_MUTEX_INITIALIZER
. I don’t think that exists for semaphores.
I was referring to Unnamed semaphores in memory shared between threads
A thread-shared semaphore is placed in an area of memory shared between the threads of a process, for example, a global variable.
Isn't that the normal use case, synchronizing different threads in the same prosses?
Probably yes, at least that is how they are used in openfortivpn
We are still left with the question if semaphores on MacOS X are async-signal-safe. I have been further studying the various variants and I have found an interesting handout.
A semaphore is somewhat like an integer variable, but is special in that its operations (increment and decrement) are guaranteed to be atomic—you cannot be halfway through incrementing the semaphore and be interrupted and waylaid by another thread trying to do the same thing.
Semaphores can be used any place where mutexes can occur. This precludes their use in interrupt handlers...
When looking for more detailed information about this I came across this thread saying that the problem could be deadlocks when a mutex is used in a signal handler. Although this is not exactly what we have in our case, let's look how semaphores are used in openfortivpn:
dispatch_semaphore_signal
is used there, and in #135 this changes to semaphore_signal
. GCD avoids going down to the kernel when ressources are available whereas the direct calls to the Mach functions always operate on the semaphore. The semaphore_signal function increments the semaphore count. If the count goes non-negative (i.e. greater than or equal to 0) and a thread is blocked on the semaphore, then the waiting thread is scheduled to execute...
Mach semaphores obey Mesa semantics—that is, when a thread is awakened by a semaphore becoming available, it is not executed immediately.
sem_stop_io
and any thread that goes into the signal handler just tries to wake up this main thread which then would continue to cancel all threads, join them and do all the other cleanup. There is no semaphore_wait
or alike in the signal handler which would result in deadlocks, and if I understand it correctly, there shouldn't be a problem that several threads attempt to do the same thing@henninglive by incidence I have found your thread here, so I'm linking this in case some new insight will be posted there by someone.
Here is the userspace source code for mach semaphores, straight syscalls. Here and Here is the kernel source code for semaphore_signal()
. sem_post()
is also a straight syscall and Here is the kernel source code. We know sem_post()
is async signal safe on named semaphores. There is lock here, might be relevant. Nevermind, it is unlocked before the call to semaphore_signal()
.
Based on this, I can’t see how semaphore_signal()
could be unsafe while sem_post()
being safe.
@henninglive thank you very much for this analysis and for the references that you have provided. I have tried to do an analysis of GCD semaphores, too. The documentation sais
A dispatch semaphore works like a regular semaphore with one exception. When resources are available, it takes less time to acquire a dispatch semaphore than it does to acquire a traditional system semaphore. This is because Grand Central Dispatch does not call down into the kernel for this particular case. The only time it calls down into the kernel is when the resource is not available and the system needs to park your thread until the semaphore is signaled.
However, I can't draw a similar comparison between the implementation and the one of POSIX named semaphores. So, given the fact that POSIX named semaphores are actually implemented on top of Mach semaphores, moving over to Mach semaphores probably is a good step forward.
@adrienverge and @DimitriPapadopoulos I'm now convinced that we should accept #135. It not only solves the locking issue that @Mabin-J has observed, but also addresses the possible issue with GCD semaphores discussed here. Do you still have any objections? Perhaps a very short summary of this profound discussion here in #105 should be included in the commit message, and perhaps we can prepare a wording here that we can then propose to @Mabin-J
@mrbaseman Looks good to me! :+1:
@mrbaseman Looks good to me too. Although I don't have time to test on Linux and cannot test on Mac, the whole discussion about different types of semaphores and their limitations is thorough and moving from GCD to Mach semaphores seems to make sense based on the available information.
Ok, I have just tested it successfully and asked for an update of the commit message in https://github.com/adrienverge/openfortivpn/pull/135#issuecomment-307048969
I have just merged #135, so I think we can also close this issue (see discussion above)
Fwiw there is the obscure and deprecated https://developer.apple.com/documentation/coreservices/1585713-mpsignalsemaphore?language=objc which explicitly says
Note that you can call this function from an interrupt handler.
And if you look at its disassembly, it just ends up calling semaphore_signal
. So this should essentially confirm it.
Also the kernel equivalent https://raw.githack.com/apple-oss-distributions/xnu/main/osfmk/man/semaphore_signal.html states
Device driver interrupt service routines may safely execute semaphore_signal operations without causing a deadlock.
Although these MultiProcessing semaphore functions are interesting... unlike traditional mach semaphore they also allow you to specify a maximum. But if you look at the implementation, it's basically something like
undefined8 sym._MPSignalSemaphore(int64_t param_1)
{
uint32_t uVar1;
undefined8 ret;
int64_t var_ch;
ret = sym._RetrieveDataFromOpaqueID(param_1, (int64_t)&var_ch, (int64_t)&var_ch + 4);
if (((int32_t)ret == 0) && (ret = 0xffff8d8d, (int32_t)var_ch == 0x73656d61)) {
ret = 0xffff8d8e; // kInsufficientResourcesErr
if (*(uint64_t *)(var_ch->curValue) < *(uint64_t *)(var_ch->maxValue)) {
sym.imp.OSAtomicAdd64Barrier(1, var_ch->curValue);
uVar1 = sym.imp.semaphore_signal();
ret = sym._kernelErrorToOSStatus((uint64_t)uVar1);
}
}
return ret;
}
In particular, the maximum tracking is just implemented by having an associated atomic counter. But this seems like an obvious race, they should have used an atomic CAS instruction instead of doing the compare then the increment.... maybe that's why these functions are deprecated?
See also https://old.reddit.com/r/osx/comments/6exsqd/mach_semaphores_in_posix_signal_handlers/
There also remains the question of whether GCD semaphores are signal-handler safe, given that they're implemented on top of mach semaphores. Empirically, the answer seems to be no. In addition to the OpenJ9 issues linked above, Golang also tried experimenting with GCD Semaphores and saw crashes with signal handlers https://go-review.googlesource.com/c/go/+/182880 (they ultimately decided to use a pipe).
I think this is one possible way that the SIGILL is generated:
1) Some thread calls dispatch_sem_wait, and the count goes from 0 to -1. So it tries to create and wait on the semaphor. But before it can fully create the semaphor, it gets interrupted.
2) In the interrupt handler, dispatch signal is called. Now the count goes from -1 to 0, and dispatch signal knows it needs to try to wakeup a thread. But it sees that no semaphor exists, so it goes to create one. (This is a state that can only ever happen when the signal is called in between another thread's atomic decrement and wait).
3) The call to _dispatch_sema4_create
is likely not re-entrant: not only is it not documented as such, there is also stuff about a semaphore cache which increases the complexity.
There is an interesting question of whether GCD semaphores that have already hit the slow-path once now become async-safe. I would think so, since then there will be no more semaphore creation and the logic is no different than a direct call to sem_signal
on the slow path.
The other thing I am not sure about, is that GCD also verifies the return value of semaphore_signal
through DISPATCH_SEMAPHORE_VERIFY_KR
which has a call to DISPATCH_VERIFY_MIG
which explicitly states that this can be returned if the function being called is not async-safe. There is not much info here, but given that sem_post
also verifies the return code, I don't think this would be returned from waiting on a mach semaphore under usual cases.
Edit: Another interesting complication is that older versions of libdispatch had a note that mach semaphores might suffer from spurious wakeups, and they used a workaround to explicitly track count.
// Mach semaphores appear to sometimes spuriously wake up. Therefore,
// we keep a parallel count of the number of times a Mach semaphore is
// signaled (6880961).
This note was removed in the 2016 version of libdispatch. But the kerne's sem_wait
implementations do not employ any workaround, they directly call into semaphore_wait
.
Looking at the semaphore_wait
/ semaphore_signal
code xnu side, I see that in the versions up to ~2016
/*
* We have to check the set wait queue. If the set
* supports pre-posting, it isn't already preposted,
* and we didn't find a thread in the set, then mark it.
*
* If we later find a thread, there may be a spurious
* pre-post here on this set. The wait side has to check
* for that either pre- or post-wait.
after 2016-ish that entire wait_queue.c
was rewritten from the ground up into a waitq.c
. I don't fully follow the logic, but I think that is good enough evidence that until 2016-ish mach_semaphores
could indeed suffer from spurious wakeups.
Openfortivpn currently uses GCD (Grand Central Dispatch) semaphores on OS-X. This is done to work around the fact the that OS-X doesn’t support unnamed posix semaphores. I ran into the same problem while working on another project and while looking for workarounds I found multiple solutions suggesting substituting posix semaphores with GCD semaphores, but I was not able to find any documentation specifying if GCD semaphores are async-signal safe.
I did some further digging and found the source for GCD. As you can see here and here, GCD uses
mach/semaphore.h
internally. Unfortunately, the documentation formach/semaphore.h
suggest that they are not async-signal safe.If this is correct, it is properly a bad idea to use GCD semaphores. The currently solution might work most of the time, but specific timing might cause it to fail. Here is more info on Async-Signal Safety.
Here is another discussion about the same problem in OpenJDK. They suggest using BSD kqueue/kevent.
In the other project I was working on we ended up using pipes, similar to this. This should work on all posix compliant systems, unlike unnamed semaphores which are an optional part of the posix standard.