cifsd-team / ksmbd

ksmbd kernel server(SMB/CIFS server)
151 stars 23 forks source link

Wonder the status of smb direct with windows clients #543

Open dz-cies opened 2 years ago

dz-cies commented 2 years ago

Hi, I'm trying to test smb direct with windows clients (with Mellanox Connectx-6 infiniband adapters) recently but have no luck. I'm able to build ksmbd and mount a share on windows server 2016 clients but rdma seems not enabled. I'm wondering whether this feature is already implemented. If it is implemented, what configuration is needed to enable the feature (`server multi channel support = yes' and anything else?)

The current available information looks a bit confusing. In the mail "[PATCH v8 00/13] ksmbd: introduce new SMB3 kernel server", it describes SMB direct as "Partially Supported. SMB3 Multi-channel is required to connect to Windows client" and SMB3 Multi-channel also "partially supported". In same mail, it reads 'SMB Direct is only currently possible with ksmbd (among Linux servers)'. So I guess windows clients are not possible yet. However, In Readme.md,SMB direct(RDMA) and Multi-channel are listed under Features implemented. And there seems to be already successful cases reported in other issues (#538 #529).

I would appreciate it if you would clarify this and share any progress about this feature.

namjaejeon commented 2 years ago

@wqlxx I have a question, I can see RDMA capable of SMB Server in windows 10 pro.(not workstation). In case of smb client, RDMA capable is set to false.

PS C:\Windows\system32> Get-SmbClientNetworkInterface

Interface Index RSS Capable RDMA Capable Speed   IpAddresses                              Friendly Name
--------------- ----------- ------------ -----   -----------                              -------------
17              True        False        1 Gbps  {fe80::64cf:2410:2250:d0fd, 172.30.1.20} 이더넷
14              True        False        25 Gbps {fe80::4878:95c7:946f:df13, 192.168.0.4} 이더넷 2
12              True        False        25 Gbps {fe80::acc5:a8c6:aa95:a13f, 192.168.0.5} 이더넷 3
PS C:\Windows\system32> Get-SmbServerNetworkInterface

Scope Name Interface Index RSS Capable RDMA Capable Speed   IpAddress
---------- --------------- ----------- ------------ -----   ---------
*          12              True        True         25 Gbps fe80::acc5:a8c6:aa95:a13f
*          14              True        True         25 Gbps fe80::4878:95c7:946f:df13
*          17              True        False        1 Gbps  fe80::64cf:2410:2250:d0fd
*          12              True        True         25 Gbps 192.168.0.5
*          14              True        True         25 Gbps 192.168.0.4
*          17              True        False        1 Gbps  172.30.1.20

If so, rdma can work with smb server of windows 10 pro ?

namjaejeon commented 2 years ago

@dz-cies Can you check kernel oops issue when enabling RSS mode with the below patch ?

https://github.com/cifsd-team/ksmbd/commit/cbcd1bab18903713f07b2e9fd5119a16371d24de

namjaejeon commented 2 years ago

The default value of ConnectionCountPerRssNetworkInterface (Get-SmbClientConfiguration) is 4 and I'm able to achieve 4x1.5GB/s with this configuration because 4 connections are established. But if I set it to a larger number, the throughput does not grow linearly because connections are not always successfully created. There's high possibility the connection fails with 'authentication failed'. Debug shows some connections failed to pass the test if (memcmp(ntlmv2->ntlmv2_hash, ntlmv2_rsp, CIFS_HMAC_MD5_HASH_SIZE) != 0) in function ksmbd_auth_ntlmv2 in auth.c which means password mismatch I believe, but of course the password is correct as all connections are using the same one. As a result, I can't stably get desired number of connections and the throughput varies largely. It varies from 1.5GB/s (1 connection) ~ 8.5GB/s (8 connections in the best case but seems not all connections are fully working or it will be 12GB/s).

@dz-cies How did you set it to large number ? Can you get linearly growing performance with windows server ? And what is your NIC(model name)?

wqlxx commented 2 years ago

@namjaejeon Yes, Win10Pro can't be used as a client. Win10Pro WorkStation can only be used as a smb direct client. Windows server can not only as the smb direct client, also can be used as the smb direct server.

namjaejeon commented 2 years ago

Win10Pro WorkStation can only be used as a smb direct client

@wqlxx Is that true ? Have you checked smbdirect on each windows pro workstation and windows server ? I thought windows 10 pro workstation could be smb direct server.

dz-cies commented 2 years ago

@dz-cies How did you set it to large number ?

Use Set-SmbClientConfiguration cmdlet, for example: Set-SmbClientConfiguration -ConnectionCountPerRssNetworkInterface 8

Can you get linearly growing performance with windows server ?

Not exactly linear but I think it's close. I get 1.6GB/s with 4 connections (per connection it seems not as good as with ksmbd), 3.8GB/s with 8, and 5.4 GB/s with 16.

And what is your NIC(model name)?

MCX653105A-HDAT

@dz-cies Can you check kernel oops issue when enabling RSS mode with the below patch ?

cbcd1ba

The kernel oops is gone after applying this patch.

Instead of the kernel oops, it's now reporting 'authentication failed' and refuse to create more connections in a high probability. So there're , say , 2 connections at the beginning and the throughput is 3GB/s. The throughput grows , after waiting some time and more connections are established. I don't know the frequency in which the windows client retries the connection. Seems it keeps retrying but successes by a small chance.

There're two cases, 1) the user just logged in the client, and no smb connection has been established before, and 2) the client does not logoff so the old connections are maintained.

In case 1, my typical fio test lasted 20 minutes started by 2 connections, grew to 4, and later 5 (the configured num is 16). The throughput grew from 3GB/s to 5.2 to 6.5. The average throughput of the whole test is 4.8GB/s.

In case 2, I have 11-13 working connections ( I believe it will finally reach the configured num 16 if I wait long enough), the throughput is 8.5~9.2GB/s (not linear but good).

It looks satisfactory that the kernel oops is resolved and the average throughput is good enough. The only issue left is it starts slow. It's inconvenient in some cases, e.g., if I reboot the windows clients every week.

namjaejeon commented 2 years ago

@dz-cies Can you check "authentication failed" with https://github.com/cifsd-team/ksmbd/commit/d1fb6f5ebf27ecf5fb4ad8c18289d84731877225 patch ?

namjaejeon commented 2 years ago

It looks satisfactory that the kernel oops is resolved and the average throughput is good enough. The only issue left is it starts slow. It's inconvenient in some cases, e.g., if I reboot the windows clients every week.

Could you explain more what is meaning "The only issue left is it starts slow." ? and why do you reboot windows clients ?

dz-cies commented 2 years ago

@dz-cies Can you check "authentication failed" with d1fb6f5 patch ?

It works great! 14 connections established (not sure why not 16) at the very beginning and I get a throughput near 9GB/s.

Could you explain more what is meaning "The only issue left is it starts slow." ?

Sorry for my ambiguous expression. It's a rough shorthand of the phenomenon in case 1 mentioned before. And it's solved by the new patch.

and why do you reboot windows clients ?

I just can't control the behavior of the users. Some users prefer to reboot their windows hosts periodically.

namjaejeon commented 2 years ago

@dz-cies Thanks for your check! Could you please give me your email and your full name to add Tested-by: tag in patches.

e.g. Tested-by: Michael Haener michael.haener@siemens.com Signed-off-by: Henning Schild henning.schild@siemens.com

Thanks!

dz-cies commented 2 years ago

Ziwei Xie zw.xie@high-flyer.cn

dz-cies commented 2 years ago

I forgot to mention I removed the "#endif" line in the second patch to make it applied to the master branch.

namjaejeon commented 2 years ago

@dz-cies Have you ever used multichannel feature of samba before ?

KristijanL commented 2 years ago

with the latest patches i don't get kernel panic anymore, but still i cant enable rdma, i get an error with 'rdma=1'

kernel: ksmbd: unknown parameter 'rdma' ignored

namjaejeon commented 2 years ago

@KristijanL there is no rdma=1 option in cifs client. You can just use rdma option except "=1". What is your NIC(mode name)?

dz-cies commented 2 years ago

@dz-cies Have you ever used multichannel feature of samba before ?

Not really. I did some test under the userland samba server (smbd). RSS works, but the throughput does not grow. In the server side, although there're multiple threads processing multiple connections, it seems there's one thread handling some works for all connections. So this thread becomes the bottleneck and the throughput is limited by the computing power of a single CPU core.

I couldn't find an alternative way to take advantage of multichannel feature with linux server before ksmbd came into sight. So I didn't use this feature in production.

namjaejeon commented 2 years ago

@dz-cies Okay. Could you please give us performance comparison of multichannel between ksmbd and samba ?

dz-cies commented 2 years ago

@dz-cies Okay. Could you please give us performance comparison of multichannel between ksmbd and samba ?

I get 1.5-1.7GB/s throughput with samba, and 8.5-9.2GB/s with ksmbd.

The server and client used in the test are the same, and I made sure in both cases 16 connections were established. I guess the samba case may be improved if a more powerful CPU is used. Also note in ksmbd case, the backend storage (I use a cephfs mount point as the backend) may be a limit.

CPU used in this test: AMD EPYC 7H12 64-Core Processor NIC used in this test: MCX653105A-HDAT

namjaejeon commented 2 years ago

What tools have you used for measuring performance ?

dz-cies commented 2 years ago

What tools have you used for measuring performance ?

I use fio and the following commandline: fio -time_based -runtime=1200 -name=test -rw=write -ioengine=windowsaio -direct=1 -iodepth=1 -numjobs=10 -bs=7M -size=10G

dz-cies commented 2 years ago

I forgot to test the read case. I just tested with the following command (only -rw option is changed): fio -time_based -runtime=1200 -name=test -rw=read -ioengine=windowsaio -direct=1 -iodepth=1 -numjobs=10 -bs=7M -size=10G The result is 3GB/s in samba case and 8.7~9.1GB/s in ksmbd case.

namjaejeon commented 2 years ago

@dz-cies Really thank a lot!!

hcbwiz commented 2 years ago

@dz-cies

I build a new server and use the latest mlnx-ofed driver with 5.10 kernel (I used kernel built-in driver before).

I get the error: "smb_direct: Can't create transport: -93" It seems that "Fast Registration Work Requests" cannot work with the mlnx-ofed driver.

I'm trying to figure out.

did you use mlnx-ofed driver for SMB direct? if yes, did you modify something else?

dz-cies commented 2 years ago

did you use mlnx-ofed driver for SMB direct?

Yes.

if yes, did you modify something else?

No. But the only branch I had rdma work was the earliest one, ksmbd-next-rdma branch(commit ID is 6650f9d).

hcbwiz commented 2 years ago

Thanks, it would be that i didn't compile it with mlnx ofed headers.

Actually, i can get the same throughput as you measured by using multi-channel in 16 tcp connections.

Thus I'm interested in rdma+multi-channel.

I quickly tried kernel built-in driver to test RDMA and multi-channel with the latest patches.

The clients can created two rdma connections.

Then i investigated the information from here When SMB is deployed with SMB Multichannel, SMB detects the RDMA capability of a network adapter and creates multiple RDMA connections for that single session, with two RDMA connections per interface. In my environment , 16 tcp connections has better throughput than 2 rdma connections.

namjaejeon commented 2 years ago

I used mellenx cx5(RoCEv2) with linux built-in mlnx5 driver The mlnx5 driver needs a modification:

@hcbwiz Do you have a plan to contribute this patch to mlnx5 driver in kernel mainline ?

namjaejeon commented 2 years ago

And I wondering that there is some case that more than one RDMA NICs could be connected to one PC? Only one is available due to a conflict? Because ksmbd must open different port(445 or 5445) according to the types of NICs (Infiniband or iWARP).

hclee commented 2 years ago

@hcbwiz

I used mellenx cx5(RoCEv2) with linux built-in mlnx5 driver The mlnx5 driver needs a modification:

Could you test it with the patch, https://github.com/hclee/ksmbd/commit/e7336564af0ffb9c62901ae7852a6b616b960986? Without the modification of linux built-in mlx5, it can set rdma capable.

hcbwiz commented 2 years ago

@hclee

I use the latest master branch with your rdma patches: hclee@e733656 dma-latest-v0.51 rdma-latest-v0.1

case 1: just load module and unload module, it got the error:

[  960.663271] ------------[ cut here ]------------
[  960.663280] refcount_t: underflow; use-after-free.
[  960.663297] WARNING: CPU: 11 PID: 10680 at lib/refcount.c:28 refcount_warn_saturate+0xe2/0xf0
[  960.663323] Modules linked in: ksmbd(O-) rdma_cm iw_cm ib_cm libdes xt_CHECKSUM iptable_mangle xt_MASQUERA                              DE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun e                              btable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp libiscsi_tcp libiscsi scsi_transpo                              rt_iscsi skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass qat_c62x crct10                              dif_pclmul crc32_pclmul intel_qat ghash_clmulni_intel aesni_intel irdma crypto_simd cryptd mlx5_ib ice rapl i                              2c_i801 ib_uverbs wdat_wdt lpc_ich intel_cstate input_leds dh_generic ib_core authenc sg mfd_core i2c_smbus i                              ntel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ioatdma acpi_power_meter acpi_pad nfsd auth_r                              pcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc32c_intel mlx5_core mlxfw i40e igb ahci pt                              p libahci pps_core i2c_algo_bit dca libata dm_mirror dm_region_hash dm_log dm_mod
[  960.663478] CPU: 11 PID: 10680 Comm: rmmod Kdump: loaded Tainted: G          IO      5.15.10 #5
[  960.663486] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_1.02.1 09/03/2019
[  960.663490] RIP: 0010:refcount_warn_saturate+0xe2/0xf0
[  960.663502] Code: 48 c7 c7 10 7e 11 82 c6 05 39 c5 11 01 01 e8 1b c8 48 00 0f 0b 5d c3 48 c7 c7 b8 7d 11 8                              2 c6 05 24 c5 11 01 01 e8 04 c8 48 00 <0f> 0b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 8b 07 3d 00 00 00 c0 74
[  960.663509] RSP: 0018:ffff88c0ce067e60 EFLAGS: 00010286
[  960.663515] RAX: 0000000000000026 RBX: ffffffffa06f2160 RCX: 0000000000000027
[  960.663520] RDX: 0000000000000027 RSI: 00000000ffffbfff RDI: ffff89003f85b438
[  960.663524] RBP: ffff88c0ce067e60 R08: ffff89003f85b430 R09: c0000000ffffbfff
[  960.663527] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffa06f2340
[  960.663531] R13: 00000000fffffe00 R14: 0000000000000000 R15: 0000000000000000
[  960.663535] FS:  00007f2875334740(0000) GS:ffff89003f840000(0000) knlGS:0000000000000000
[  960.663541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  960.663574] CR2: 00000000007d3c78 CR3: 000000407b833004 CR4: 00000000007706e0
[  960.663579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  960.663582] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  960.663585] PKRU: 55555554
[  960.663588] Call Trace:
[  960.663592]  <TASK>
[  960.663599]  ib_client_put+0x3a/0x40 [ib_core]
[  960.663667]  ib_unregister_client+0x27/0x160 [ib_core]
[  960.663718]  ? ksmbd_tcp_destroy+0x1c/0xb0 [ksmbd]
[  960.663754]  ksmbd_rdma_destroy+0x4d/0x60 [ksmbd]
[  960.663780]  ksmbd_conn_transport_destroy+0x2a/0xd0 [ksmbd]
[  960.663809]  ksmbd_server_exit+0x29/0x33c [ksmbd]
[  960.663837]  __x64_sys_delete_module+0x11f/0x200
[  960.663848]  do_syscall_64+0x35/0x80
[  960.663859]  entry_SYSCALL_64_after_hwframe+0x44/0xae

case 2: just load module and use windows client to connect it. the "RDMA capacity" is okay, but I got the error:

[  299.476080] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  299.479148] #PF: supervisor read access in kernel mode
[  299.481218] #PF: error_code(0x0000) - not-present page
[  299.483299] PGD 0 P4D 0
[  299.484509] Oops: 0000 [#1] SMP
[  299.485903] CPU: 3 PID: 421 Comm: ksmbd:r445 Tainted: G           O      5.15.10-shiluvia #5
[  299.489255] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  299.492605] RIP: 0010:__ib_umem_release+0x74/0xa0 [ib_uverbs]
[  299.494084] Code: 48 89 ef e8 8e 3c 23 e1 ff c3 48 89 c5 41 39 5c 24 5c 77 ce 5b 5d 49 8d 7c 24 50 41 5c 41 5d e9 22 42 23 e1 41 bd 01 00 0                                                                                               0 00 <48> 8b 3f 48 85 ff 74 a0 41 8b 54 24 5c 49 8b 74 24 50 45 31 c0 31
[  299.498576] RSP: 0018:ffff88811a4a3d88 EFLAGS: 00010246
[  299.499923] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8881070ae180
[  299.501420] RDX: 0000000000000001 RSI: ffff88810711f000 RDI: 0000000000000000
[  299.502670] RBP: ffff88810711f000 R08: 0000000000000000 R09: ffff88811a4a3db0
[  299.503920] R10: ffff8881099de9c0 R11: 0000000000000000 R12: ffff88810711f000
[  299.505168] R13: 0000000000000000 R14: ffff888103b3c600 R15: dead000000000100
[  299.506416] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  299.507911] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  299.508947] CR2: 0000000000000000 CR3: 000000011a85a006 CR4: 0000000000370ee0
[  299.510219] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  299.511286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  299.512205] Call Trace:
[  299.512586]  <TASK>
[  299.512938]  ib_umem_release+0x25/0x80 [ib_uverbs]
[  299.513596]  mlx5_ib_dereg_mr+0x1f2/0x3c0 [mlx5_ib]
[  299.514280]  ib_dereg_mr_user+0x33/0x60 [ib_core]
[  299.514938]  ib_mr_pool_destroy+0x77/0xa0 [ib_core]
[  299.515619]  free_transport+0x9b/0x260 [ksmbd]
[  299.516244]  ? __cond_resched+0x11/0x40
[  299.516788]  ? smb_direct_disconnect+0x22/0xa0 [ksmbd]
[  299.517494]  ksmbd_conn_handler_loop+0x125/0x1e0 [ksmbd]
[  299.518215]  ? ksmbd_conn_alive+0x80/0x80 [ksmbd]
[  299.518876]  kthread+0x11f/0x140
[  299.519354]  ? set_kthread_struct+0x30/0x30
[  299.519939]  ret_from_fork+0x1f/0x30
[  299.520473]  </TASK>
hclee commented 2 years ago

@hcbwiz

the second error doesn't seem to be related to the patch, https://github.com/hclee/ksmbd/commit/e7336564af0ffb9c62901ae7852a6b616b960986. Maybe your async_disconnect patch is needed.

I will try to reproduce the first error and solve these problems.

Your test is really helpful. Thank you!

hcbwiz commented 2 years ago

@hclee sorry for the wrong information.

you are right. the second error is related to "disconnecting"

RDMA connecting and transferring can work wells with your patch.

When I used older kernel ( linux 5.10 ~ 5.14),
my "async_disconnect" patch can force to shutdown the RDMA connections.

I'm using linux kernel 5.15.10, it seems my dirty patch cannot work.

However, without the dirty patch, the ksmbd cannot shutdown because the windows client still connect it.

hclee commented 2 years ago

@hcbwiz

Could you test the shutdown problem with the patch, https://github.com/hclee/ksmbd/commit/ddf47d7de03f79fe9df229ec0ee4819cb904f1c8?

And I have fixed the kernel oops related with the RDMA capability with the patch, https://github.com/hclee/ksmbd/commit/c047549af5f4cdf8a6515eecc8882b873989a51f

hcbwiz commented 2 years ago

@hclee

hclee@c047549 has fixed the issue. Thanks.

About shutdown problem, I reverted my dirty patch and applied this path: hclee@ddf47d7 It got the similar error:

[  603.285988] general protection fault, probably for non-canonical address 0x300802901000000                                                    : 0000 [#1] SMP
[  603.291711] CPU: 1 PID: 964 Comm: ksmbd:r445 Tainted: G           O      5.15.10-shiluvia                                                     #5
[  603.296664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g1558                                                    21a1990b-prebuilt.qemu.org 04/01/2014
[  603.301670] RIP: 0010:__ib_umem_release+0x74/0xa0 [ib_uverbs]
[  603.303859] Code: 48 89 ef e8 8e 8c 22 e1 ff c3 48 89 c5 41 39 5c 24 5c 77 ce 5b 5d 49 8d                                                     7c 24 50 41 5c 41 5d e9 22 92 22 e1 41 bd 01 00 00 00 <48> 8b 3f 48 85 ff 74 a0 41 8b 54 24 5                                                    c 49 8b 74 24 50 45 31 c0 31
[  603.309848] RSP: 0018:ffff888103e73d88 EFLAGS: 00010246
[  603.311201] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888103c54180
[  603.312991] RDX: 0000000000000001 RSI: ffff888119a51000 RDI: 0300802901000000
[  603.314781] RBP: ffff888119a51000 R08: 0000000000000000 R09: ffff888103e73db0
[  603.316577] R10: ffff88810af25f40 R11: 0000000000000000 R12: ffff888119a51000
[  603.318373] R13: 0000000000000000 R14: ffff8881035d0180 R15: dead000000000100
[  603.319981] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[  603.321686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  603.322875] CR2: 00007eff7caa4608 CR3: 0000000129faa002 CR4: 0000000000370ee0
[  603.324344] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  603.325788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  603.327265] Call Trace:
[  603.327857]  <TASK>
[  603.328424]  ib_umem_release+0x25/0x80 [ib_uverbs]
[  603.329462]  mlx5_ib_dereg_mr+0x1f2/0x3c0 [mlx5_ib]
[  603.330527]  ib_dereg_mr_user+0x33/0x60 [ib_core]
[  603.331568]  ib_mr_pool_destroy+0x77/0xa0 [ib_core]
[  603.332598]  free_transport+0x9b/0x260 [ksmbd]
[  603.333584]  ? __cond_resched+0x11/0x40
[  603.334439]  ? smb_direct_disconnect+0x22/0xa0 [ksmbd]
[  603.335524]  ksmbd_conn_handler_loop+0x125/0x1e0 [ksmbd]
[  603.336622]  ? ksmbd_conn_alive+0x80/0x80 [ksmbd]
[  603.337647]  kthread+0x11f/0x140
[  603.338393]  ? set_kthread_struct+0x30/0x30
[  603.339314]  ret_from_fork+0x1f/0x30
[  603.340120]  </TASK>
[  603.340694] Modules linked in: nls_utf8 ksmbd(O) rdma_cm iw_cm ib_cm oid_registry mlx5_ib                                                     ib_uverbs ib_core nvme_fabrics crc32c_intel nvme_core mlx5_core evdev [last unloaded: ksmbd]
[  603.343926] ---[ end trace 026ecc4b447d9704 ]---

One more question: the original code stops "listener (call ksmbd_rdma_destroy())" first before calling stop_sessions(). it would avoid those clients reconnecting again like the "tcp transport" of ksmbd.

Actually, I referenced the kernel nfs server to come out the "async connect" patch. For tcp connnection, it can just close the sockets and linux TCP/IP stack can handle it internally. For rdma connection, it seems that the ksmbd server needs some extra handing.

hcbwiz commented 2 years ago

I used mellenx cx5(RoCEv2) with linux built-in mlnx5 driver The mlnx5 driver needs a modification:

@hcbwiz Do you have a plan to contribute this patch to mlnx5 driver in kernel mainline ?

@namjaejeon this patch hclee@c047549 can handle this problem better even using out-of-tree driver.

hclee commented 2 years ago

@hcbwiz

One more question: the original code stops "listener (call ksmbd_rdma_destroy())" first before calling stop_sessions(). it would avoid those clients reconnecting again like the "tcp transport" of ksmbd.

Right, the original code aims to disconnect clients forcibly and avoid clients' reconnecting.

Actually, I referenced the kernel nfs server to come out the "async connect" patch. For tcp connnection, it can just close the sockets and linux TCP/IP stack can handle it internally. For rdma connection, it seems that the ksmbd server needs some extra handing.

I found the patch set you mentioned. I need to go into more details. Thank you!

hcbwiz commented 2 years ago

@hclee

the disconnecting problem is caused by a bug in mlx5_ib driver: Kernel panic when called ib_dereg_mr

after applying the patch, the "async disconnect" patch can work.

hclee commented 2 years ago

@hcbwiz

the disconnecting problem is caused by a bug in mlx5_ib driver: Kernel panic when called ib_dereg_mr

Great! there is a problem with mlx5 driver.

after applying the patch, the "async disconnect" patch can work.

Currently I don't have a environment to test the "async disconnect" patch. Next week, I will take a look at your patch. Thank you!

namjaejeon commented 2 years ago

@hcbwiz Where is async disconnect patch ?

hcbwiz commented 2 years ago

@namjaejeon

Please check it here: rdma: fix force-shutdown

Actually, it includes another fix: #538

hclee commented 2 years ago

@hcbwiz

I can't reproduce the kernel oops solved by the "async commit" patch. Could you explain how the problem can be reproduced?

hcbwiz commented 2 years ago

@hclee

The latest master branch with your rdma patch. it works!!!.

I do some tests and try to recall my memory:

[  859.855041] ksmbd: kill command received
[  888.749389] ksmbd: smb_direct: Recv completed. status='success (0)', opcode=128
[  888.752596] ksmbd: smb_direct: Recv completed. status='success (0)', opcode=128
[  893.755581] ksmbd: smb_direct: RDMA CM event. cm_id=00000000c8c8ce7c event=disconnected (10)
[  893.759175] ksmbd: smb_direct: RDMA CM event. cm_id=0000000085158708 event=disconnected (10)
[  893.759213] ksmbd: smb_direct: Disconnecting cm_id=00000000c8c8ce7c
[  893.763339] ksmbd: smb_direct: Disconnecting cm_id=0000000085158708
[  893.766239] ksmbd: smb_direct: wait for all send posted to IB to finish
[  893.768770] ksmbd: smb_direct: wait for all send posted to IB to finish
[  893.780793] ksmbd: smb_direct: drain the reassembly queue
[  893.781156] ksmbd: smb_direct: drain the reassembly queue

In the earlier version, I couldn't get the "disconnected event" after destroying "listener cm_id". (hmm.. there were also no "Recv completed" messages.) Thus the ksmbd didn't do "diconnecting"

Then I came out the "async disconnect patch. This patch do a initiative disconnecting, and the ksmbd server can shutdown quickly.

[ 2706.135918] ksmbd: kill command received
[ 2706.138517] ksmbd: smb_direct: Async Disconnecting cm_id=0000000072d0cddd
[ 2706.141640] ksmbd: smb_direct: Async Disconnecting cm_id=00000000957468b5
[ 2706.142480] ksmbd: smb_direct: RDMA CM event. cm_id=0000000072d0cddd event=disconnected (10)
[ 2706.149069] ksmbd: smb_direct: Disconnecting cm_id=0000000072d0cddd
[ 2706.149710] ksmbd: smb_direct: RDMA CM event. cm_id=00000000957468b5 event=disconnected (10)
[ 2706.152040] ksmbd: smb_direct: wait for all send posted to IB to finish
[ 2706.158473] ksmbd: smb_direct: Disconnecting cm_id=00000000957468b5
[ 2706.161058] ksmbd: smb_direct: wait for all send posted to IB to finish
[ 2706.163596] ksmbd: smb_direct: drain the reassembly queue
[ 2706.168557] ksmbd: smb_direct: drain the reassembly queue
namjaejeon commented 2 years ago

@hclee If you have any pending smb direct patches in your queue, Please send them before rc-1 window opening.

hclee commented 2 years ago

@hclee If you have any pending smb direct patches in your queue, Please send them before rc-1 window opening.

Okay, I will send them. But I am implementing handling multiple buffer descriptors and don't know if I can complete it within the schedule. without this patch Windows clients cannot read/write files.

My rdma branch solves this temporarily with changing the maximum read/write size.

namjaejeon commented 2 years ago

It is important to work RDMA with Windows, even if the performance is a bit lower. Multiple buffer descriptors can be applied as a performance improvement patch after that.

namjaejeon commented 2 years ago

How much performance is degraded by adjusting the read/write size with linux cifs client?

hclee commented 2 years ago

How much performance is degraded by adjusting the read/write size with linux cifs client?

There is no difference in performance. If multiple buffer descriptor is implemented, the size can be increased greater than 1MB. I will also send the patch which limits the size.

namjaejeon commented 2 years ago

@hclee Just curious, Is there any reason to take so long to make the read_write size patch ?

hclee commented 2 years ago

@hclee Just curious, Is there any reason to take so long to make the read_write size patch ?

Some case of xfstests fails, and I trying to find the cause. The new patch set does not seem to related with the problem.

namjaejeon commented 2 years ago

When read_write size is smaller than 1MB, xfstests failed? right ?

The new patch set does not seem to related with the problem.

What is new patch ?

hclee commented 2 years ago

When read_write size is smaller than 1MB, xfstests failed? right ?

No, regardless of the size, some case of xfstests fails.

The new patch set does not seem to related with the problem.

What is new patch ?

The patch set includes limiting the read/write size and creating mr pool.