cifsd-team / ksmbd

ksmbd kernel server(SMB/CIFS server)
151 stars 23 forks source link

smb direct: kernel NULL pointer dereference during server shutdown #538

Closed hcbwiz closed 1 year ago

hcbwiz commented 2 years ago

Hi, I used windows 2016 server as the client with mellanox connectx-5 (rocev2), and got the error during server shutdown:

[  410.640357] BUG: kernel NULL pointer dereference, address: 0000000000000102
[  410.643899] #PF: supervisor read access in kernel mode
[  410.646516] #PF: error_code(0x0000) - not-present page
[  410.649129] PGD 0 P4D 0
[  410.650541] Oops: 0000 [#1] SMP
[  410.652408] CPU: 0 PID: 1067 Comm: kworker/0:2H Tainted: G           O      5                                                                                        .14.3-shiluvia #3
[  410.656484] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1                                                                                        .14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  410.660197] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  410.662112] RIP: 0010:__queue_work+0x1b/0x340
[  410.663651] Code: 0f 0b c3 0f 0b 31 c0 c3 0f 1f 80 00 00 00 00 41 57 41 56 49                                                                                         89 f6 41 55 41 54 41 89 fc 55 48 89 d5 53 48 83 ec 10 89 7c 24 04 <f6> 86 02 01                                                                                         00 00 01 0f 85 48 02 00 00 48 bb eb 83 b5 80 46 86 c8
[  410.668404] RSP: 0018:ffff88811c55bd90 EFLAGS: 00010086
[  410.669636] RAX: 0000000000000000 RBX: 0000000000000202 RCX: 0000000000000002
[  410.671250] RDX: ffff88812bf311f0 RSI: 0000000000000000 RDI: 0000000000000010
[  410.672851] RBP: ffff88812bf311f0 R08: ffff88812bf310e0 R09: ffff88812bf310e0
[  410.674466] R10: 0000000000002000 R11: ffff88823ff2d000 R12: 0000000000000010
[  410.676063] R13: 0000000000000009 R14: 0000000000000000 R15: ffff888107e34000
[  410.677313] FS:  0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:00000                                                                                        00000000000
[  410.678744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  410.679757] CR2: 0000000000000102 CR3: 000000012d663001 CR4: 0000000000370ef0
[  410.680987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  410.682206] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  410.683432] Call Trace:
[  410.683930]  queue_work_on+0x1b/0x30
[  410.684608]  recv_done+0x280/0x290 [ksmbd]
[  410.685376]  __ib_process_cq+0x69/0xc0 [ib_core]
[  410.686220]  ib_cq_poll_work+0x21/0x70 [ib_core]
[  410.687065]  process_one_work+0x185/0x2e0
[  410.687805]  worker_thread+0x4e/0x3c0
[  410.688504]  ? process_one_work+0x2e0/0x2e0
[  410.689272]  kthread+0x11f/0x140
[  410.689891]  ? set_kthread_struct+0x30/0x30
[  410.690653]  ret_from_fork+0x1f/0x30

'smb_direct_wq' had been destroyed in ksmbd_rdma_destroy().

The quick fixlet works for me:

@@ -545,6 +549,11 @@ static void recv_done(struct ib_cq *cq, struct ib_wc *wc)
                    ib_wc_status_msg(wc->status), wc->status,
                    wc->opcode);

+       if (unlikely(!smb_direct_wq)) {
+               put_empty_recvmsg(t, recvmsg);
+               return;
+       }
+
        ib_dma_sync_single_for_cpu(wc->qp->device, recvmsg->sge.addr,
                                   recvmsg->sge.length, DMA_FROM_DEVICE); 
hclee commented 2 years ago

Hello @hcbwiz,

Does your branch include The following commit? https://github.com/cifsd-team/ksmbd/commit/c6aaea31ee0021d51632bbe6eac7a46d5e84038b

This issue looks similar with https://github.com/cifsd-team/ksmbd/issues/529.

hcbwiz commented 2 years ago

Hi,

I use this branch: https://github.com/namjaejeon/ksmbd/tree/ksmbd-next

with the patch: hclee@f482116

About the ksmbd server: It was running on a VM (qemu/KVM) with mellanox connectx-5 via PCI passthrough (vfio-pci). mlx5 driver: 5.14.3 kernel, built-in mlx5 driver.

namjaejeon commented 1 year ago

@hcbwiz This ISSUE can be closed ? You found this issue on non-mainline ksmbd.

hcbwiz commented 1 year ago

Surely, it has been resolved. Thanks