cocalele / ViveNAS

5 stars 4 forks source link

MDCache report `RW LOCK :CRIT :Error 35, write locking` #3

Closed cocalele closed 1 year ago

cocalele commented 2 years ago

在open2函数调用后,发现报告RW LOCK :CRIT :Error 35, write locking, 而且这个错误是从MDCache代码里报告的,此时所有代码的调用栈都已出了vivenas的代码。当前已进行了如下的尝试:

cocalele commented 2 years ago

在决定以FSAL_MEM为基础渐进开发ViveNAS的适配后,死锁问题再次出现。

通过在common_utils.h的加锁,解锁代码里面增加log,发现死锁的位置在这里:

mem_handle.c:998
    /* Check if this context already holds the lock on
     * this directory.
     */
    if (op_ctx->fsal_private != parent) {
        PTHREAD_RWLOCK_rdlock(&parent->obj_lock);     ===== rdlocked here
        lock_applyed = 1;
    }
    else
        LogFullDebug(COMPONENT_FSAL,
                 "Skipping lock for %s",
                 myself->m_name);

    status = mem_int_lookup(myself, path, &hdl);     =====acqrie wrlock again during alloc_handle => mem_insert_obj
    if (FSAL_IS_ERROR(status)) {
        goto out;
    }

    *handle = &hdl->obj_handle;
    mem_int_get_ref(hdl);

out:
    if (lock_applyed && op_ctx->fsal_private == parent){
        S5LOG_ERROR("Lock applied but not released!");
    }

    if (op_ctx->fsal_private != parent)
        PTHREAD_RWLOCK_unlock(&parent->obj_lock);

vivenas在lookup的时候发现找不到某个child,就会从rocksdb里找,找到后又试图插入到FSAL_MEM的avl tree里面。 原始的FSAL_MEM并没有从底层存储查找然后插入到avl tree的环节,因此不会死锁。

cocalele commented 2 years ago

avl tree这套机制是FSAM_MEM为了实现内存查找使用,按道理可以去掉。 但是去掉后发现file open的时候无法返回正确的file fd, 从而导致segfault. 从日志看,同一个文件分配了多次object handle和state

[DEBU 2022-08-06 18:55:06.259]call mem_getattrs on inode:2(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:09.477]Lookup on parent ino:2 file_name:dd(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:09.477]Lookup on parent ino:2 file_name:dd, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:09.477]will create file:dd mode:0040777, uid:0(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:961:mem_create_obj)
[INFO 2022-08-06 18:55:09.486]Create file:2_dd, new ino:12, mode:0040777 (octal)(../../../src/file.cpp:115:vn_create_file)
[DEBU 2022-08-06 18:55:09.486]alloc handle for inode:12 name:dd(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:09.486]call mem_getattrs on inode:2(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.555]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.555]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]alloc state:0x7f5374009e60(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.556]open file:b1 myslef:0x7f537800da40 myself_name:dd state:0x7f5374009e60(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]will create file:b1 mode:00100666, uid:0(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:961:mem_create_obj)
[INFO 2022-08-06 18:55:17.562]Create file:12_b1, new ino:13, mode:00100666 (octal)(../../../src/file.cpp:115:vn_create_file)

<<<<<<<alloc hanle first time>>>>>
[DEBU 2022-08-06 18:55:17.562]alloc handle for inode:13 name:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.562]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
[DEBU 2022-08-06 18:55:17.562]call mem_getattrs on inode:12(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.563]Lookup on parent ino:2 file_name:dd(../../../src/file.cpp:499:vn_lookup_inode_no)
[WARN 2022-08-06 18:55:17.563]inode memory may not released!(../../../src/file.cpp:511:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.563]Lookup on parent ino:2 file_name:dd, get ino:12(../../../src/file.cpp:521:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.563]alloc handle for inode:12 name:dd(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.566]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[WARN 2022-08-06 18:55:17.566]inode memory may not released!(../../../src/file.cpp:511:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.566]Lookup on parent ino:12 file_name:b1, get ino:13(../../../src/file.cpp:521:vn_lookup_inode_no)

<<<<<<<alloc handle second time>>>>>
[DEBU 2022-08-06 18:55:17.566]alloc handle for inode:13 name:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.566]alloc state:0x7f5378121650(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.566]open file:(null) myslef:0x7f5378010d30 myself_name:b1 state:0x7f5378121650(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.566]open myfd:0x7f5378121650 myself fd:0x7f5378010f20, myname:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1788:mem_open2)
[DEBU 2022-08-06 18:55:17.566]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
[DEBU 2022-08-06 18:55:17.566]call mem_getattrs on inode:13(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.570]alloc state:0x7f5378121d00(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.570]open file:(null) myslef:0x7f5378010d30 myself_name:b1 state:0x7f5378121d00(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.570]open myfd:0x7f5378121d00 myself fd:0x7f5378010f20, myname:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1788:mem_open2)
[DEBU 2022-08-06 18:55:17.570]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
Segmentation fault
cocalele commented 1 year ago

In commit https://github.com/cocalele/nfs-ganesha/commit/a6c84bb71a57883cb5e21d7ab2c75e4b866832cd, old code from FSAL_MEM has been deleted, and this deadlock problem is gone.