Closed cocalele closed 1 year ago
在决定以FSAL_MEM为基础渐进开发ViveNAS的适配后,死锁问题再次出现。
通过在common_utils.h的加锁,解锁代码里面增加log,发现死锁的位置在这里:
mem_handle.c:998
/* Check if this context already holds the lock on
* this directory.
*/
if (op_ctx->fsal_private != parent) {
PTHREAD_RWLOCK_rdlock(&parent->obj_lock); ===== rdlocked here
lock_applyed = 1;
}
else
LogFullDebug(COMPONENT_FSAL,
"Skipping lock for %s",
myself->m_name);
status = mem_int_lookup(myself, path, &hdl); =====acqrie wrlock again during alloc_handle => mem_insert_obj
if (FSAL_IS_ERROR(status)) {
goto out;
}
*handle = &hdl->obj_handle;
mem_int_get_ref(hdl);
out:
if (lock_applyed && op_ctx->fsal_private == parent){
S5LOG_ERROR("Lock applied but not released!");
}
if (op_ctx->fsal_private != parent)
PTHREAD_RWLOCK_unlock(&parent->obj_lock);
vivenas在lookup的时候发现找不到某个child,就会从rocksdb里找,找到后又试图插入到FSAL_MEM的avl tree里面。 原始的FSAL_MEM并没有从底层存储查找然后插入到avl tree的环节,因此不会死锁。
avl tree这套机制是FSAM_MEM为了实现内存查找使用,按道理可以去掉。 但是去掉后发现file open的时候无法返回正确的file fd, 从而导致segfault. 从日志看,同一个文件分配了多次object handle和state
[DEBU 2022-08-06 18:55:06.259]call mem_getattrs on inode:2(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:09.477]Lookup on parent ino:2 file_name:dd(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:09.477]Lookup on parent ino:2 file_name:dd, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:09.477]will create file:dd mode:0040777, uid:0(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:961:mem_create_obj)
[INFO 2022-08-06 18:55:09.486]Create file:2_dd, new ino:12, mode:0040777 (octal)(../../../src/file.cpp:115:vn_create_file)
[DEBU 2022-08-06 18:55:09.486]alloc handle for inode:12 name:dd(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:09.486]call mem_getattrs on inode:2(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.555]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.555]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]alloc state:0x7f5374009e60(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.556]open file:b1 myslef:0x7f537800da40 myself_name:dd state:0x7f5374009e60(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]Lookup on parent ino:12 file_name:b1, Not found(../../../src/file.cpp:504:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.556]will create file:b1 mode:00100666, uid:0(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:961:mem_create_obj)
[INFO 2022-08-06 18:55:17.562]Create file:12_b1, new ino:13, mode:00100666 (octal)(../../../src/file.cpp:115:vn_create_file)
<<<<<<<alloc hanle first time>>>>>
[DEBU 2022-08-06 18:55:17.562]alloc handle for inode:13 name:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.562]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
[DEBU 2022-08-06 18:55:17.562]call mem_getattrs on inode:12(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.563]Lookup on parent ino:2 file_name:dd(../../../src/file.cpp:499:vn_lookup_inode_no)
[WARN 2022-08-06 18:55:17.563]inode memory may not released!(../../../src/file.cpp:511:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.563]Lookup on parent ino:2 file_name:dd, get ino:12(../../../src/file.cpp:521:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.563]alloc handle for inode:12 name:dd(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.566]Lookup on parent ino:12 file_name:b1(../../../src/file.cpp:499:vn_lookup_inode_no)
[WARN 2022-08-06 18:55:17.566]inode memory may not released!(../../../src/file.cpp:511:vn_lookup_inode_no)
[DEBU 2022-08-06 18:55:17.566]Lookup on parent ino:12 file_name:b1, get ino:13(../../../src/file.cpp:521:vn_lookup_inode_no)
<<<<<<<alloc handle second time>>>>>
[DEBU 2022-08-06 18:55:17.566]alloc handle for inode:13 name:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:669:_mem_alloc_handle)
[DEBU 2022-08-06 18:55:17.566]alloc state:0x7f5378121650(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.566]open file:(null) myslef:0x7f5378010d30 myself_name:b1 state:0x7f5378121650(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.566]open myfd:0x7f5378121650 myself fd:0x7f5378010f20, myname:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1788:mem_open2)
[DEBU 2022-08-06 18:55:17.566]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
[DEBU 2022-08-06 18:55:17.566]call mem_getattrs on inode:13(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1376:mem_getattrs)
[DEBU 2022-08-06 18:55:17.570]alloc state:0x7f5378121d00(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_export.c:174:mem_alloc_state)
[DEBU 2022-08-06 18:55:17.570]open file:(null) myslef:0x7f5378010d30 myself_name:b1 state:0x7f5378121d00(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1728:mem_open2)
[DEBU 2022-08-06 18:55:17.570]open myfd:0x7f5378121d00 myself fd:0x7f5378010f20, myname:b1(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:1788:mem_open2)
[DEBU 2022-08-06 18:55:17.570]VIVE=== call in mem_open_my_fd, name:b1, mount ctx:0x562bbca84720(/root/v2/nfs-ganesha/src/FSAL/FSAL_MEM/mem_handle.c:582:mem_open_my_fd)
Segmentation fault
In commit https://github.com/cocalele/nfs-ganesha/commit/a6c84bb71a57883cb5e21d7ab2c75e4b866832cd, old code from FSAL_MEM has been deleted, and this deadlock problem is gone.
在open2函数调用后,发现报告
RW LOCK :CRIT :Error 35, write locking
, 而且这个错误是从MDCache代码里报告的,此时所有代码的调用栈都已出了vivenas的代码。当前已进行了如下的尝试:ganesha v4换到v3, 并模仿fsal ceph的代码实现open2
发现ganesha v3自带的FSAL_MEM模块无法正常启动,感觉问题发散了,决定仍切换会ganesha v4.
ganesha v4的FSAL_MEM模块能正常使用
拿mem fsal的代码过来,只做最细微的调整 上面的方法都调不通。因此决定下面的方法
在ganesha的工程里进行改造,基础版本可以运行,只要加上vivenas链接,nfs mount就卡住了,任何错误信息都不出。 很多次尝试后发现问题在于vivenas.so文件本身是一个fsal plugin, 里面有fsal注册的代码。把fsal代码去掉重新编译so文件。这样FSAL_MEM就可以正常响应mount