linnemannr commented 6 years ago

Summary

The MRSW lock uses a single reader mutex that is locked by the first reader thread to acquire the read lock and released by the last thread to release the read lock. If the first acquiring and last releasing threads are not the same threads, the mutex will be left locked by the first thread and never unlocked. The next attempt to acquire the read lock will deadlock the calling thread since it will try to lock the mutex, which is still locked.

Description

Example:

Thread 1 acquires the read MRSW lock on a volume, and with the reader count < 1 locks the reader mutex
Thread 2 acquires the read MRSW lock on the same volume, and only ticks up the reader count
Thread 1 releases the read MRSW lock on the volume, and does not unlock the reader mutex
Thread 2 releases the read MRSW lock on the volume, and erroneously unlocks the reader mutex, but does not detect that the unlock failed due to the thread not owning the mutex lock in the first place
Thread N acquires the read MRSW lock on the volume, and blocks to lock the reader mutex that thread 1 left locked
Thread N is now deadlocked

Environment

OS: Linux/FreeBSD, probably others
Distribution: All
Architecture: All
Additional info:

How to recreate

This will not show up in the FUSE driver, since it is inherently single threaded. You will need to write a test program that links to libltfs and spawns threads that hold the reader lock on a volume in overlapping rather than nested windows. You could guarantee the timing by coordinating the threads with condvars.

I've gotten around this with a local delta by replacing the innards of the mrsw implementation with pthread_rwlock_t for our FreeBSD build, but that is not a portable solution.

piste-jp commented 6 years ago

First of all, thank you for your interests about this project.

I understand what you are saying is

3 threads, thread A, B and C, are spawned from the main thread
Thread A initializes the mutex
Thread B takes the mutex
Thread C tries to release the mutex (But this release is failed)

Is this correct?

From POSIX point of view, unlock against a mutex taken by another thread is "undefined". But I think Linux and OSX allows to unlock a mutex by different thread when the mutex is created by pthread_mutex_init(lock, NULL), default attribute.

I don't know the FreeBSD behavior well. So I tried to confirm the problem by following tiny code but I cannot recreated on FreeBSD 11.1. The mutex was released successfully.

Am I missing something? It is very helpful if you provide a pseudo code or tiny test code.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <stdbool.h>
#include <pthread.h>

bool keep_alive = true;
bool locked = false;

static void *init_thread(void *arg)
{
    int ret;
    pthread_mutex_t *lock = (pthread_mutex_t *)arg;

    ret = pthread_mutex_init(lock, NULL);
    if (ret < 0)
        printf("Failed to init the lock(%d)\n", errno);
    else
       printf("The lock was initialized\n");

    while (keep_alive)
    sleep(3);

    return NULL;
}

static void *lock_thread(void *arg)
{
    int ret;
    pthread_mutex_t *lock = (pthread_mutex_t *)arg;

    while (keep_alive) {
        if (!locked) {
            ret = pthread_mutex_lock(lock);
            if (ret < 0)
                printf("Failed to take the lock(%d)\n", errno);
            else
                printf("The lock was took\n");

            locked = true;
        }
        sleep(3);
    }

    return NULL;
}

static void *unlock_thread(void *arg)
{
    int ret;
    pthread_mutex_t *lock = (pthread_mutex_t *)arg;

    while (keep_alive) {
        if (locked) {
            ret = pthread_mutex_unlock(lock);
            if (ret < 0)
                printf("Failed to release the lock(%d)\n", errno);
            else
                printf("The lock was released\n");

            locked = false;
        }
        sleep(3);
    }

    return NULL;
}

int main(int argc, char** argv)
{
    int ret = 0;
    pthread_mutex_t lock;
    pthread_t ti, tl, tu;

    ret = pthread_create(&ti, NULL, init_thread, &lock);
    if (ret < 0) {
        printf("Failed to create a thread (%d)\n", errno);
        return 1;
    }

    sleep(3);

    ret = pthread_create(&tl, NULL, lock_thread, &lock);
    if (ret < 0) {
        printf("Failed to create a thread (%d)\n", errno);
        return 1;
    }

    ret = pthread_create(&tu, NULL, unlock_thread, &lock);
    if (ret < 0) {
        printf("Failed to create a thread (%d)\n", errno);
        return 1;
    }

    sleep(60);
    keep_alive = false;

    ret = pthread_join(ti, NULL);
    if (ret < 0) {
        printf("Failed to join the thread (%d)\n", errno);
        return 1;
    } else
        printf("Thread joined\n");

    ret = pthread_join(tl, NULL);
    if (ret < 0) {
        printf("Failed to join the thread (%d)\n", errno);
        return 1;
    } else
        printf("Thread joined\n");

    ret = pthread_join(tu, NULL);
    if (ret < 0) {
        printf("Failed to join the thread (%d)\n", errno);
        return 1;
    } else
        printf("Thread joined\n");

    pthread_mutex_destroy(&lock);

    return 0;
}

linnemannr commented 6 years ago

The problem comes when you attempt to lock the mutex again. Try this program:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct  thread_data_t {
    pthread_cond_t cv;
    pthread_mutex_t cv_mtx;
    pthread_mutex_t error_mtx;
};

void *thr1(void *udata)
{
    struct thread_data_t *td = (struct thread_data_t *)udata;
    int ret;

    /* thr1 owns the lock on error_mtx, and does not unlock it */
    ret = pthread_mutex_lock(&td->error_mtx);
    printf("thr1: pthread_mutex_lock returns %d, error '%s'\n", ret,
        ret!=0?strerror(ret):NULL);

    sleep(3);
    ret = pthread_cond_signal(&td->cv);
    printf("thr1: pthread_cond_signal returns %d, error '%s'\n", ret,
        ret!=0?strerror(ret):NULL);
    return (NULL);
}

void *thr2(void *udata)
{
    struct thread_data_t *td = (struct thread_data_t *)udata;
    int ret;

    ret = pthread_cond_wait(&td->cv, &td->cv_mtx);

    /* thr2 does not own the lock on error_mutex */
    ret = pthread_mutex_unlock(&td->error_mtx);
    printf("thr2: pthread_mutex_unlock returns %d, error '%s'\n", ret,
        ret!=0?strerror(ret):NULL);

    /* another attempt to lock cv_error_mutex will now fail, since
     * pthread_mutex_unlock() was called from a thread which did not own
     * the lock */
    ret = pthread_mutex_trylock(&td->error_mtx);
    printf("thr2: pthread_mutex_trylock returns %d, error '%s'\n", ret, 
        ret!=0?strerror(ret):NULL);
    return (NULL);
}

int main(int argc, char **argv)
{
    int ret;
    pthread_t t1, t2;
    struct thread_data_t td;

    ret = pthread_cond_init(&td.cv, NULL);
    if (ret < 0) {
        fprintf(stderr, "Failed to init condvar: %s\n", strerror(ret));
        exit(ret);
    }

    ret = pthread_mutex_init(&td.cv_mtx, NULL);
    if (ret < 0) {
        fprintf(stderr, "Failed to init cv_mtx: %s\n", strerror(ret));
        exit(ret);
    }

    ret = pthread_mutex_init(&td.error_mtx, NULL);
    if (ret < 0) {
        fprintf(stderr, "Failed to init error_mtx: %s\n", strerror(ret));
        exit(ret);
    }

    /* create thread 2 first as it waits on the signal from thread 1 */
    ret = pthread_create(&t2, NULL, thr2, (void *)&td);
    if (ret < 0) {
        fprintf(stderr, "Failed to create thread t2: %s\n", strerror(ret));
        exit(ret);
    }
    ret = pthread_create(&t1, NULL, thr1, (void *)&td);
    if (ret < 0) {
        fprintf(stderr, "Failed to create thread t1: %s\n", strerror(ret));
        exit(ret);
    }

    ret = pthread_join(t2, NULL);
    if (ret < 0) {
        fprintf(stderr, "Failed to join thread t2: %s\n", strerror(ret));
        exit(ret);
    }
    ret = pthread_join(t1, NULL);
    if (ret < 0) {
        fprintf(stderr, "Failed to join thread t1: %s\n", strerror(ret));
        exit(ret);
    }
    return (0);
}

Output:

thr1: pthread_mutex_lock returns 0, error '(null)'
thr2: pthread_mutex_unlock returns 1, error 'Operation not permitted'
thr2: pthread_mutex_trylock returns 16, error 'Device busy'
thr1: pthread_cond_signal returns 0, error '(null)'

Here, thr1 emulates reader 1 acquiring the reader lock and locking the internal mutex, and then releasing the reader lock by decrementing the reader count after another reader (thr2) has acquired it. As the mrsw implementation is written, in this situation thr1 does not unlock the mutex since the reader count is nonzero upon releasing the reader lock.

thr2 emulates the second thread acquiring a reader lock after thr1 has done so, then waiting for th1 to release the reader lock. It then emulates releasing the reader lock, which no longer has any readers, causing thr2 to attempt to unlock the reader mutex. thr2 then attempts to acquire the reader lock again, and since no readers are present it attempts to lock the mutex which is still locked by thr1.

Does that illustrate the problem better?

linnemannr commented 6 years ago

I should note that according to http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_gettype.html , the default mutex type is implementation specific. On FreeBSD 11.1, the default is PTHREAD_MUTEX_ERRORCHECK.

piste-jp commented 6 years ago

Thank you very much !!

I recreated the problem in your code. And also I found my fault in my first code, I must detect the error by if(ret!=0) instead of if(ret<0).

Next, I try to run the code on FreeBSD, Linux and macOS. And results are only FreeBSD fails.

piste-jp commented 6 years ago

I learned the compatibility of pthread itself is little bit nasty... I think it is good to add some code only for FreeBSD into thread related wrapper layer, ltfs_thread.h, ltfs_thread.c and ltfs_locking.h as a part of FreeBSD porting.

What do you think?

In my sense, it doesn't make any sense the we try to create a portable code around locking mechanism even if each OS has each locking policy.

Currently I know following flavors

Linux: unlock by another thread is permissive by default
macOS: unlock by another thread is permissive by default
Windows: pthread_mutex in MinGW implemented on a "handle"
FreeBSD: unlock by another thread is not allowed in any pthread_mutex types

On windows (in IBM's Spectrum Archive), we faced to the similar problem because pthread_mutex in MinGW uses a handle per mutex. As a result, we need to have a upper limit of files. So we decided to introduce the wrapper layer to choose better locking mechanism in each OSs.

I have one more thing to note. The MRSW intents to process writer lock request as soon as possible. It means rdlock requests after a wrlock is lower priority than the wrlock like

rdlock1 (1)
rdlock2 (2)
wrlock (3)
rdlock3 (4)

The rdlock3 is blocked until wrlock is released.

This behavior may be same as PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP in Linux. (But I hesitate to move to this because of we need long time to evaluate, may be)

I assume rdlock3 is higher priority if one of rdlock1 or rdlock2 is remaining in FreeBSD's pthread_rwlock. I don't think this causes a big problem but you need to understand the behavior is little bit changed.

linnemannr commented 6 years ago

I'd argue that trusting undefined behavior is dangerous. The point of the mutex is to allow threads to coordinate on a mutually exclusive basis - if a thread can hijack a lock from another thread that is in a critical section, the entire guarantee of mutual exclusion is null and void. If Linux and OSX truly allow this, I would caution that their implementations are probably incorrect.

Have you considered instead using a semaphore for the reader instead of a mutex? I think that might fit the use case you're looking for more appropriately than a mutex, since you are effectively attempting to reproduce the behavior of a semaphore with a mutex and reader count.

linnemannr commented 6 years ago

Actually, I think what you want is something inverse of a semaphore - where the waiting thread wakes up once the count reaches zero. Perhaps you could use a semaphore initialized to 1, and when the multi reader lock is acquired you could sem_wait on the semaphore. When the last reader leaves, you could sem_post on the semaphore.

piste-jp commented 6 years ago

Yes, I considered to use semaphore before. But I didn't chose that because

Semaphore has upper limit in the system. Now we are using 2 MultiReaderSingleWriter structures in each dentry. This means the maximum number of object in this file system depends on max number of semaphore. (Max semaphore default is 32000 in RHEL7)
The semaphore is persistent. We need to have some cleaning mechanism when the ltfs command is crashed.
The semaphore can access from another process. This is little bit bigger feature than we need.

piste-jp commented 6 years ago

I may find a solution.

Introduce new lock ltfs_rwlock_t for wrapping pthread_rwlock
Change MultiReaderSingleWriter structure to use a ltfs_mutex_t and a ltfs_rw_lock

How do you think the following pseudo code?

typedef struct MultiReaderSingleWriter {
        ltfs_mutex_t exclusive_mutex;
        ltfs_rwlock_t rw_lock;
        uint32_t long_lock;
} MultiReaderSingleWriter;

static inline void
acquirewrite_mrsw(MultiReaderSingleWriter *mrsw)
{
        ltfs_mutex_lock(&mrsw->exclusive_mutex);
        ltfs_rwlock_wrlock(&mrsw->rw_lock);
        mrsw->long_lock=0;
}

static inline void
releasewrite_mrsw(MultiReaderSingleWriter *mrsw)
{
        mrsw->long_lock=0;
        ltfs_rwlock_unlock(&mrsw->&mrsw->rw_lock);
        ltfs_mutex_unlock(&mrsw->exclusive_mutex);
}

static inline void
acquireread_mrsw(MultiReaderSingleWriter *mrsw)
{
        ltfs_mutex_lock(&mrsw->exclusive_mutex);
        mrsw->long_lock=0;
        ltfs_rwlock_rdlock(&mrsw->rw_lock);
        ltfs_mutex_unlock(&mrsw->exclusive_mutex);
}

static inline void
releaseread_mrsw(MultiReaderSingleWriter *mrsw)
{
        ltfs_rwlock_unlock(&mrsw->&mrsw->rw_lock);
}

I can move to this architecture on every platforms in a some timing in the future. But I don't want to rush it. So I would like to begin this only on FreeBSD first and then go to other platforms gradually with careful tests.

LinearTapeFileSystem / ltfs

MRSW lock can deadlock after reader count reaches zero #8

Summary

Description

Environment

How to recreate