cifsd-team / ksmbd

ksmbd kernel server(SMB/CIFS server)
154 stars 23 forks source link

Turning on debug causes kernel stack trace (using ZFS) #618

Open cyfdecyf opened 3 weeks ago

cyfdecyf commented 3 weeks ago

First I want to note this seems to be a problem ONLY on ZFS when turning on debug. (i.e. ksmbd is working fine if debug is not turned on when using ZFS.) When debugging SMB direct last year, I didn't have kernel traces when using ksmbd on ext4 file system.

Steps to produce

On system with ZFS, turn on all debug log with ksmbd.control --debug=all, then do some heavy file operations from any smb client.

Here's some kernel stack trace generated on my system. Note the first trace is the first one shown in log, the second one seems relevant to ZFS.

kernel: ksmbd: NTLMSSP SecurityBufferLength 138
kernel: ksmbd: credits: requested[1] granted[1] total_granted[33]
kernel: ksmbd: Received request for session setup
kernel: general protection fault, probably for non-canonical address 0xa2364752c60f01d5: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 7 PID: 1668359 Comm: kworker/7:2 Tainted: P     U     OE      6.8.12-1-pve #1
kernel: Hardware name: Maxsun Default string/MS-Terminator B760M D4, BIOS H5.3G 08/02/2023
kernel: Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
kernel: RIP: 0010:__kmalloc_node_track_caller+0x247/0x4a0
kernel: Code: 31 db c3 cc cc cc cc c1 e8 03 0f b6 90 40 54 e9 ae e9 3f fe ff ff 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0>
kernel: RSP: 0018:ffffa31fc464bd10 EFLAGS: 00010282
kernel: RAX: a2364752c60f01d5 RBX: 2e61902e595597d9 RCX: 0000000000000038
kernel: RDX: 000000000aabe007 RSI: 000000000003ada0 RDI: a2364752c60f01b5
kernel: RBP: ffffa31fc464bd60 R08: ffff8b3d5f051a00 R09: 0000000000000028
kernel: R10: ffffa31fc464bd78 R11: 0000000000000000 R12: ffff8b3d4004ad00
kernel: R13: 0000000000000cc0 R14: 00000000ffffffff R15: 0000000000000040
kernel: FS:  0000000000000000(0000) GS:ffff8b5c3f780000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000070b44f68e1a8 CR3: 000000113b434004 CR4: 0000000000f72ef0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? show_regs+0x6d/0x80
kernel:  ? die_addr+0x37/0xa0
kernel:  ? exc_general_protection+0x1db/0x480
kernel:  ? asm_exc_general_protection+0x27/0x30
kernel:  ? __kmalloc_node_track_caller+0x247/0x4a0
kernel:  ? xa_store+0x38/0x50
kernel:  ? smb2_sess_setup+0x1101/0x1590 [ksmbd]
kernel:  kmemdup+0x20/0x50
kernel:  ? kmemdup+0x20/0x50
kernel:  smb2_sess_setup+0x1101/0x1590 [ksmbd]
kernel:  handle_ksmbd_work+0x16b/0x4a0 [ksmbd]
kernel:  process_one_work+0x16a/0x350
kernel:  worker_thread+0x306/0x440
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xef/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x44/0x70
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: Modules linked in: tcp_diag inet_diag cmac nls_utf8 ksmbd(OE) crc32_generic libdes bluetooth ecdh_generic ecc macvlan nf_conntrack_netlink xt_nat xt_tcpudp xt_conntr>
kernel:  ee1004 input_leds pmt_telemetry intel_vsec pmt_class acpi_tad acpi_pad mei_me mac_hid mei vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp vf>
kernel: ---[ end trace 0000000000000000 ]---
kernel: ksmbd: SMB2 data length 40 offset 88
kernel: ksmbd: SMB2 len 128
kernel: ksmbd: Received request for session setup

...

kernel: ksmbd: SMB2 len 128
kernel: ksmbd: Received request for session setup
kernel: general protection fault, probably for non-canonical address 0xa2364752c60f01d5: 0000 [#3] PREEMPT SMP NOPTI
kernel: CPU: 7 PID: 1685205 Comm: runc:[2:INIT] Tainted: P     UD    OE      6.8.12-1-pve #1
kernel: Hardware name: Maxsun Default string/MS-Terminator B760M D4, BIOS H5.3G 08/02/2023
kernel: RIP: 0010:kmalloc_trace+0xd7/0x360
kernel: Code: 83 78 10 00 48 8b 38 0f 84 36 02 00 00 48 85 ff 0f 84 2d 02 00 00 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0>
kernel: RSP: 0018:ffffa31f8c817ba0 EFLAGS: 00010282
kernel: RAX: a2364752c60f01d5 RBX: 2e61902e595597d9 RCX: 0000000000000000
kernel: RDX: 000000000aabe007 RSI: 000000000003ada0 RDI: a2364752c60f01b5
kernel: RBP: ffffa31f8c817bf0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: ffffa31f8c817c10 R11: 0000000000000000 R12: ffff8b3d4004ad00
kernel: R13: 0000000000000dc0 R14: 0000000000000030 R15: 0000000000000000
kernel: FS:  000078e9daca2740(0000) GS:ffff8b5c3f780000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000c00018d000 CR3: 00000004dd24a003 CR4: 0000000000f72ef0
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? show_regs+0x6d/0x80
kernel:  ? die_addr+0x37/0xa0
kernel:  ? exc_general_protection+0x1db/0x480
kernel:  ? asm_exc_general_protection+0x27/0x30
kernel:  ? kmalloc_trace+0xd7/0x360
kernel:  ? bpf_int_jit_compile+0x57f/0x620
kernel:  bpf_int_jit_compile+0x57f/0x620
kernel:  ? bpf_int_jit_compile+0x57f/0x620
kernel:  bpf_prog_select_runtime+0xc0/0x100
kernel:  bpf_prepare_filter+0x3ed/0x510
kernel:  bpf_prog_create_from_user+0xc8/0x130
kernel:  ? __pfx_seccomp_check_filter+0x10/0x10
kernel:  do_seccomp+0x155/0xba0
kernel:  do_seccomp+0x155/0xba0
kernel:  get_nvlist+0x85/0x130 [zfs]
kernel:  zfsdev_ioctl_common+0x324/0x9f0 [zfs]
kernel:  ? kvmalloc_node+0x5d/0x100
kernel:  ? spl_kvmalloc+0xa5/0xc0 [spl]
kernel:  ? kvmalloc_node+0x5d/0x100
kernel:  ? spl_kvmalloc+0xa5/0xc0 [spl]
kernel:  zfsdev_ioctl+0x57/0xf0 [zfs]
kernel:  __x64_sys_ioctl+0xa0/0xf0
kernel:  x64_sys_call+0xa68/0x24b0
kernel:  do_syscall_64+0x81/0x170
kernel:  ? syscall_exit_to_user_mode+0x89/0x260
kernel:  ? do_syscall_64+0x8d/0x170
kernel:  ? set_ptes.constprop.0+0x2b/0xb0
kernel:  ? next_uptodate_folio+0x8c/0x260
kernel:  ? filemap_map_pages+0x4b8/0x5b0
kernel:  ? do_fault+0x269/0x4c0
kernel:  ? __handle_mm_fault+0x887/0xed0
kernel:  ? __count_memcg_events+0x6f/0xe0
kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
kernel:  ? handle_mm_fault+0xad/0x380
kernel:  ? do_user_addr_fault+0x337/0x650
kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
kernel:  ? irqentry_exit+0x43/0x50
kernel:  ? exc_page_fault+0x94/0x1b0
kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
kernel: RIP: 0033:0x7367d4109c5b
kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8>
kernel: RSP: 002b:00007ffc64228af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 0000631cd58e6350 RCX: 00007367d4109c5b
kernel: RDX: 00007ffc64228b50 RSI: 0000000000005a3f RDI: 0000000000000003
kernel: RBP: 00007ffc6422c130 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc64228b50
kernel: R13: 0000631cd58db2c0 R14: 0000631cd4a58840 R15: 0000000000000007
kernel:  </TASK>
kernel: Modules linked in: tcp_diag inet_diag cmac nls_utf8 ksmbd(OE) crc32_generic libdes bluetooth ecdh_generic ecc macvlan nf_conntrack_netlink xt_nat xt_tcpudp xt_conntr>
kernel:  ee1004 input_leds pmt_telemetry intel_vsec pmt_class acpi_tad acpi_pad mei_me mac_hid mei vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid coretemp vf>
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:__kmalloc_node_track_caller+0x247/0x4a0
kernel: Code: 31 db c3 cc cc cc cc c1 e8 03 0f b6 90 40 54 e9 ae e9 3f fe ff ff 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0>
kernel: RSP: 0018:ffffa31fc464bd10 EFLAGS: 00010282
kernel: RAX: a2364752c60f01d5 RBX: 2e61902e595597d9 RCX: 0000000000000038
kernel: RDX: 000000000aabe007 RSI: 000000000003ada0 RDI: a2364752c60f01b5
kernel: RBP: ffffa31fc464bd60 R08: ffff8b3d5f051a00 R09: 0000000000000028
kernel: R10: ffffa31fc464bd78 R11: 0000000000000000 R12: ffff8b3d4004ad00
kernel: R13: 0000000000000cc0 R14: 00000000ffffffff R15: 0000000000000040
kernel: FS:  00007367d383d800(0000) GS:ffff8b5c3f780000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007367d405d9e0 CR3: 000000053ad24006 CR4: 0000000000f72ef0
kernel: PKRU: 55555554
pvestatd[2748]: zfs error: got signal 11
kernel: ksmbd: SMB2 data length 40 offset 88

Software versions

namjaejeon commented 2 weeks ago

Sorry for late response. I will check two issue that you reported.

namjaejeon commented 1 week ago

@cyfdecyf Is there any reason why you use zfs ? ZFS violates the kernels licene. And zfs is not in linux kernel mainline. Can you confirm this problem is not happening with ext4 or xfs ?

cyfdecyf commented 5 days ago

@namjaejeon I'm using Proxmox VE (PVE), for features like virtual machine disk replication in PVE, it's required to use ZFS. I'm also "abusing" the host system to provide SMB sharing, so I ended up sharing ZFS dataset with ksmbd.

I can confirm this does NOT happen with ext4 while I tested ksmbd in a virtual machine.

I understand ZFS's license issue with the Linux kernel. You can close this issue as I guess it would be too much burden to keep supporting ZFS.

namjaejeon commented 5 days ago

@cyfdecyf Okay, Can you test ksmbd & zfs after turning KASAN on ? It seems slub/slab allocator is corrupted.

cyfdecyf commented 4 days ago

@namjaejeon I will have a try next week on holiday.