Open elliefm opened 5 years ago
@vlmarek The proposed fix is in and of itself, not thread safe, as one thread could allocate and start using the mutex, and a second thread could allocate a new mutex, overwriting the pointer to the first one. So while the problem description is correct, it needs a different solution.
@quanah wouldn't it be on the mutexes implementation to make sure that sasl_MUTEX_ALLOC() is itself thread safe and always returns the same pointer if multiple threads are racing to call it ?
@simo5 the API isn't designed to allow callers to pass in a unique handle, so sasl_MUTEX_ALLOC() can't know it's being called multiple times for the same purpose. It's functionally like malloc, always returning a new object on every call.
@hyc duh right! I was looking at the tree (the specific place) and not the forest ... I guess some trick could be done to figure out if the same handle needs to be returned based on the stack, but yeah, guess a new API would need to be provided here to do the right thing.
libsasl2 isn't thread-safe until someone supplies viable callbacks via sasl_set_mutex()
.
To get there, we might need to provide and document a single API entry point that users should call before libsasl2 is considered initialised. Probably a single one that does it for both client and server side. Getting this into 2.2 would give us a good opportunity to adjust our API and docs accordingly.
Is there a reason why sasl2 can't use pthread_mutex by default instead of being simply thread unsafe?
We should have a single documented entry point that initialises the library's mutexes, that would be the only thing people would have to make sure not to call concurrently.
In pthread we have PTHREAD_MUTEX_INITIALIZER
, so the init above might even be a noop but officially, we also support systems (Windows) where AFAIK we can't bootstrap a mutex ourselves? (outside some __atomic_
+ Sleep(0)
trickery)
Right, some platforms may need that, but some platforms can be made safer by default, perhaps they should be.
@mistotebe I don't think #803 completely fixed this for sasl_client_init
unless I'm missing something.
I think the patch only made it such that _sasl_common_init is thread-safe, but even before entering that, sasl_client_init
's _sasl_client_active
check is not thread-safe and does mutate global_callbacks_client
and more importantly cmechlist
without any locks. So it's possible that one thread turns cmechlist to NULL and triggers a segfault.
Here's a simple repro:
cat test.c
#include <stdio.h>
#include <pthread.h>
#include <sasl/sasl.h>
void *work(void *param) {
sasl_client_init(param);
return NULL;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, work, NULL);
work(NULL);
pthread_join(t, NULL);
return 0;
}
And I'm getting the following segfault:
Stack trace of thread 1996385:
#0 0x00007fc0e3dd27eb init_mechlist (/usr/lib64/libsasl2.so.3.0.0)
#1 0x000000000040115e n/a (/home/ayi/test)
#2 0x00007fc0e39bb1ca start_thread (libpthread.so.0)
#3 0x00007fc0e3627e73 __clone (libc.so.6)
Stack trace of thread 1996384:
#0 0x00007fc0e39bc6cd __GI___pthread_timedjoin_ex (libpthread.so.0)
#1 0x00000000004011be n/a (/home/ayi/test)
#2 0x00007fc0e3628d85 __libc_start_main (libc.so.6)
#3 0x000000000040108e n/a (/home/ayi/test)
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fc0e3dd27eb in init_mechlist () at client.c:69
69 if (cmechlist->utils==NULL)
[Current thread is 1 (Thread 0x7fc0e1d75700 (LWP 1996385))]
(gdb) p cmechlist
$1 = (cmech_list_t *) 0x0
(gdb)
In another more complicated code, I'm getting a similar segfault also due to cmechlist turning 0x00:
#0 0x00007f49ca5e6fe7 in sasl_client_add_plugin (plugname=plugname@entry=0x7f49ca5f590d "EXTERNAL", entry_point=<optimized out>) at client.c:225
#1 0x00007f49ca5e7817 in sasl_client_init (callbacks=0x0) at client.c:311
(gdb) p cmechlist
$1 = (cmech_list_t *) 0x0
I think we might still need something similar to static_mutex
to guard client.c
in addition to #803 ?
Originally opened as https://github.com/cyrusimap/cyrus-imapd/issues/2813 by @vlmarek
With the following attached patch:
https://github.com/cyrusimap/cyrus-imapd/files/3331156/116-race-condition.txt