make init_mechlist thread-safe

elliefm commented 5 years ago

Originally opened as https://github.com/cyrusimap/cyrus-imapd/issues/2813 by @vlmarek

Hi,

We are doing some housekeeping in the changes we use in cyrus-sasl in Solaris.

https://github.com/oracle/solaris-userland/tree/master/components/cyrus-sasl/patches

And I have noticed that we have in the pipeline a fix which never got to be used. It seems simple enough, but I can't judge if this is something you would consider having. If yes, I'm happy to try to port it to current version and create a pull request. If no I will just drop it because we don't need it internally anymore.

That said I don't even see lib/client.c in your code so probably this is pointless ...

Thank you __ Vlad

With the following attached patch:

https://github.com/cyrusimap/cyrus-imapd/files/3331156/116-race-condition.txt

quanah commented 4 years ago

@vlmarek The proposed fix is in and of itself, not thread safe, as one thread could allocate and start using the mutex, and a second thread could allocate a new mutex, overwriting the pointer to the first one. So while the problem description is correct, it needs a different solution.

simo5 commented 4 years ago

@quanah wouldn't it be on the mutexes implementation to make sure that sasl_MUTEX_ALLOC() is itself thread safe and always returns the same pointer if multiple threads are racing to call it ?

hyc commented 4 years ago

@simo5 the API isn't designed to allow callers to pass in a unique handle, so sasl_MUTEX_ALLOC() can't know it's being called multiple times for the same purpose. It's functionally like malloc, always returning a new object on every call.

simo5 commented 4 years ago

@hyc duh right! I was looking at the tree (the specific place) and not the forest ... I guess some trick could be done to figure out if the same handle needs to be returned based on the stack, but yeah, guess a new API would need to be provided here to do the right thing.

mistotebe commented 1 year ago

libsasl2 isn't thread-safe until someone supplies viable callbacks via sasl_set_mutex().

To get there, we might need to provide and document a single API entry point that users should call before libsasl2 is considered initialised. Probably a single one that does it for both client and server side. Getting this into 2.2 would give us a good opportunity to adjust our API and docs accordingly.

simo5 commented 1 year ago

Is there a reason why sasl2 can't use pthread_mutex by default instead of being simply thread unsafe?

mistotebe commented 1 year ago

We should have a single documented entry point that initialises the library's mutexes, that would be the only thing people would have to make sure not to call concurrently.

In pthread we have PTHREAD_MUTEX_INITIALIZER, so the init above might even be a noop but officially, we also support systems (Windows) where AFAIK we can't bootstrap a mutex ourselves? (outside some __atomic_ + Sleep(0) trickery)

simo5 commented 1 year ago

Right, some platforms may need that, but some platforms can be made safer by default, perhaps they should be.

adamyi commented 7 months ago

@mistotebe I don't think #803 completely fixed this for sasl_client_init unless I'm missing something.

I think the patch only made it such that _sasl_common_init is thread-safe, but even before entering that, sasl_client_init's _sasl_client_active check is not thread-safe and does mutate global_callbacks_client and more importantly cmechlist without any locks. So it's possible that one thread turns cmechlist to NULL and triggers a segfault.

Here's a simple repro:

cat test.c
#include <stdio.h>
#include <pthread.h>
#include <sasl/sasl.h>

void *work(void *param) {
  sasl_client_init(param);
  return NULL;
}

int main() {
  pthread_t t;
  pthread_create(&t, NULL, work, NULL);
  work(NULL);
  pthread_join(t, NULL);
  return 0;
}

And I'm getting the following segfault:

                Stack trace of thread 1996385:
                #0  0x00007fc0e3dd27eb init_mechlist (/usr/lib64/libsasl2.so.3.0.0)
                #1  0x000000000040115e n/a (/home/ayi/test)
                #2  0x00007fc0e39bb1ca start_thread (libpthread.so.0)
                #3  0x00007fc0e3627e73 __clone (libc.so.6)

                Stack trace of thread 1996384:
                #0  0x00007fc0e39bc6cd __GI___pthread_timedjoin_ex (libpthread.so.0)
                #1  0x00000000004011be n/a (/home/ayi/test)
                #2  0x00007fc0e3628d85 __libc_start_main (libc.so.6)
                #3  0x000000000040108e n/a (/home/ayi/test)

Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc0e3dd27eb in init_mechlist () at client.c:69
69   if (cmechlist->utils==NULL)
[Current thread is 1 (Thread 0x7fc0e1d75700 (LWP 1996385))]
(gdb) p cmechlist
$1 = (cmech_list_t *) 0x0
(gdb)

In another more complicated code, I'm getting a similar segfault also due to cmechlist turning 0x00:

#0  0x00007f49ca5e6fe7 in sasl_client_add_plugin (plugname=plugname@entry=0x7f49ca5f590d "EXTERNAL", entry_point=<optimized out>) at client.c:225
#1  0x00007f49ca5e7817 in sasl_client_init (callbacks=0x0) at client.c:311

(gdb) p cmechlist
$1 = (cmech_list_t *) 0x0

I think we might still need something similar to static_mutex to guard client.c in addition to #803 ?

cyrusimap / cyrus-sasl

make init_mechlist thread-safe #584