cubefs / cubefs

cloud-native distributed storage
https://cubefs.io
Apache License 2.0
4.47k stars 639 forks source link

[Bug]: kernel task will be blocked while one client is writing to a file that is deleted by another client #3373

Closed BillXiang closed 1 month ago

BillXiang commented 1 month ago

Contact Details

xiangwencheng@dayudpu.com

Is there an existing issue for this?

Priority

high

Environment

- CubeFS version:
3.3.1
- Deployment mode(docker or standalone or cluster):
cluster
- Dependent components:
- OS kernel version(Ubuntu or CentOS):
CentOS 7
- CPU/Memory:
- Others:

Current Behavior

When I mount a volume to two client, and one client is writing to a file that is deleted by another client, the writing operation will be blocked until I kill the client.

Expected Behavior

When I mount a volume to two client, and one client is writing to a file that is deleted by another client, the writing operation should return with error instead always retry

Steps To Reproduce

No response

CubeFS Log

2024/05/13 17:19:39.588653 [WARN ] conn.go:117: sendToMetaPartition: retry failed req(ReqID(49411)Op(OpMetaExtentAddWithCheck)PartitionID(536)ResultCode(Unknown ResultCode(0))) mp(PartitionID(536) Start(8388609) End(9223372036854775807) Members([10.15.42.2:17210 10.15.42.4:17210 10.15.42.1:17210]) LeaderAddr(10.15.42.2:17210) Status(2)) mc(partitionID(536) addr(10.15.42.2:17210)) errs(map[0:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591979]AT[1715591979]MT[1715591979]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist] 1:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591976]AT[1715591976]MT[1715591976]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist] 2:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591976]AT[1715591976]MT[1715591976]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist]]) resp(ReqID(49411)Op(OpMetaExtentAddWithCheck)PartitionID(536)ResultCode(Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591979]AT[1715591979]MT[1715591979]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist)) 2024/05/13 17:19:39.589605 [WARN ] conn.go:117: sendToMetaPartition: retry failed req(ReqID(49411)Op(OpMetaExtentAddWithCheck)PartitionID(536)ResultCode(Unknown ResultCode(0))) mp(PartitionID(536) Start(8388609) End(9223372036854775807) Members([10.15.42.2:17210 10.15.42.4:17210 10.15.42.1:17210]) LeaderAddr(10.15.42.2:17210) Status(2)) mc(partitionID(536) addr(10.15.42.4:17210)) errs(map[0:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591979]AT[1715591979]MT[1715591979]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist] 1:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591979]AT[1715591979]MT[1715591979]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist] 2:request should retry[Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591976]AT[1715591976]MT[1715591976]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist]]) resp(ReqID(49411)Op(OpMetaExtentAddWithCheck)PartitionID(536)ResultCode(Err: inode[Inode{Inode[8388614]Type[0]Uid[0]Gid[0]Size[0]Gen[1]CT[1715591979]AT[1715591979]MT[1715591979]LinkT[]NLink[1]Flag[0]Reserved[0]Extents[[]]ObjExtents[[]]}] not exist))

Anything else? (Additional Context)

No response

BillXiang commented 1 month ago

[2575875.069654] RIP: 0033:0x7f5717fa7ba0 [2575875.069660] RSP: 002b:00007ffda61d8228 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [2575875.069665] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5717fa7ba0 [2575875.069669] RDX: 0000000000001000 RSI: 00000000017ac000 RDI: 0000000000000001 [2575875.069672] RBP: 0000000000001000 R08: 00007f57184986b0 R09: 0000000000003003 [2575875.069674] R10: 00007ffda61d7ca0 R11: 0000000000000246 R12: 0000000000001000 [2575875.069677] R13: 00000000017ac000 R14: 00000000017ad000 R15: 0000000000000000 [2575997.946466] INFO: task dd:14818 blocked for more than 245 seconds. [2575997.946534] Tainted: G S 5.11.6 #1 [2575997.946588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2575997.946643] task:dd state:D stack: 0 pid:14818 ppid: 17174 flags:0x00004004 [2575997.946654] Call Trace: [2575997.946663] __schedule+0x2ab/0x800 [2575997.946686] schedule+0x3c/0xa0 [2575997.946696] request_wait_answer+0x12f/0x220 [fuse] [2575997.946715] ? finish_wait+0x80/0x80 [2575997.946729] fuse_simple_request+0x192/0x2d0 [fuse] [2575997.946743] fuse_perform_write+0x34d/0x690 [fuse] [2575997.946762] fuse_file_write_iter+0x31a/0x400 [fuse] [2575997.946773] ? trigger_load_balance+0x52/0x220 [2575997.946780] new_sync_write+0x11f/0x1b0 [2575997.946791] vfs_write+0x218/0x280 [2575997.946798] ksys_write+0xa1/0xe0 [2575997.946805] do_syscall_64+0x33/0x40 [2575997.946814] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [2575997.946823] RIP: 0033:0x7f5717fa7ba0 [2575997.946828] RSP: 002b:00007ffda61d8228 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [2575997.946833] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5717fa7ba0 [2575997.946837] RDX: 0000000000001000 RSI: 00000000017ac000 RDI: 0000000000000001 [2575997.946840] RBP: 0000000000001000 R08: 00007f57184986b0 R09: 0000000000003003 [2575997.946843] R10: 00007ffda61d7ca0 R11: 0000000000000246 R12: 0000000000001000 [2575997.946846] R13: 00000000017ac000 R14: 00000000017ad000 R15: 0000000000000000