cifsd-team / ksmbd

ksmbd kernel server(SMB/CIFS server)
152 stars 23 forks source link

Wonder the status of smb direct with windows clients #543

Open dz-cies opened 2 years ago

dz-cies commented 2 years ago

Hi, I'm trying to test smb direct with windows clients (with Mellanox Connectx-6 infiniband adapters) recently but have no luck. I'm able to build ksmbd and mount a share on windows server 2016 clients but rdma seems not enabled. I'm wondering whether this feature is already implemented. If it is implemented, what configuration is needed to enable the feature (`server multi channel support = yes' and anything else?)

The current available information looks a bit confusing. In the mail "[PATCH v8 00/13] ksmbd: introduce new SMB3 kernel server", it describes SMB direct as "Partially Supported. SMB3 Multi-channel is required to connect to Windows client" and SMB3 Multi-channel also "partially supported". In same mail, it reads 'SMB Direct is only currently possible with ksmbd (among Linux servers)'. So I guess windows clients are not possible yet. However, In Readme.md,SMB direct(RDMA) and Multi-channel are listed under Features implemented. And there seems to be already successful cases reported in other issues (#538 #529).

I would appreciate it if you would clarify this and share any progress about this feature.

namjaejeon commented 2 years ago

@hclee What is relation between windows RDMA connection and some of xfstests failure ?

hclee commented 2 years ago

@hclee What is relation between windows RDMA connection and some of xfstests failure ?

There is no relation. I was going to send the patch set for Windows clients after figuring out the cause of xfstests failures. I will send the patch set.

namjaejeon commented 2 years ago

Let me know some of failed xfstests numbers.

hclee commented 2 years ago

The failed xfstests test cases are generic/013 generic/109 generic/113 generic/465 generic/476 generic/551 generic/590.

I ran xfstests with the following command:

./check cifs/001 generic/001 generic/002 generic/005 generic/006 generic/007 generic/008 generic/011 generic/013 generic/014 generic/020 generic/023 generic/024 generic/028 generic/029 generic/030 generic/032 generic/033 generic/036 generic/037 generic/069 generic/070 generic/071 generic/074 generic/080 generic/084 generic/086 generic/095 generic/098 generic/100 generic/103 generic/109 generic/113 generic/117 generic/124 generic/125 generic/129 generic/130 generic/132 generic/133 generic/135 generic/141 generic/169 generic/198 generic/207 generic/208 generic/210 generic/211 generic/212 generic/214 generic/215 generic/221 generic/225 generic/228 generic/236 generic/239 generic/241 generic/245 generic/246 generic/247 generic/248 generic/249 generic/257 generic/258 generic/286 generic/308 generic/309 generic/310 generic/313 generic/315 generic/316 generic/337 generic/339 generic/340 generic/344 generic/345 generic/346 generic/349 generic/350 generic/354 generic/360 generic/377 generic/391 generic/393 generic/394 generic/406 generic/412 generic/420 generic/422 generic/432 generic/433 generic/436 generic/437 generic/438 generic/439 generic/443 generic/445 generic/446 generic/448 generic/451 generic/452 generic/454 generic/460 generic/464 generic/465 generic/476 generic/490 generic/504 generic/523 generic/524 generic/533 generic/539 generic/551 generic/567 generic/568 generic/590 generic/591

namjaejeon commented 2 years ago

@hcbwiz I have applied your async disconnect patch. Can you check totally rdma working with these branches ? git clone https://github.com/namjaejeon/ksmbd git clone https://github.com/namjaejeon/ksmbd-tools

And I will add your signed-off-by in async disconnect patch if you give me your mail address and your name like this. (Namjae Jeon linkinjeon@kernel.org)

If it work fine, I will apply the patches to cifsd-team master.

namjaejeon commented 2 years ago

@dz-cies Ziwei, Can you also check the rdma working like multichannel on your target ? performance too.

hcbwiz commented 2 years ago

@hcbwiz I have applied your async disconnect patch. Can you check totally rdma working with these branches ? git clone https://github.com/namjaejeon/ksmbd git clone https://github.com/namjaejeon/ksmbd-tools

It works well in my environment. Thanks.

And I will add your signed-off-by in async disconnect patch if you give me your mail address and your name like this. (Namjae Jeon linkinjeon@kernel.org)

Yufan Chen wiz.chen@gmail.com

If it work fine, I will apply the patches to cifsd-team master.

namjaejeon commented 2 years ago

@hcbwiz Done! Thanks for your work!

dz-cies commented 2 years ago

@dz-cies Ziwei, Can you also check the rdma working like multichannel on your target ? performance too.

I'm sorry but currently we're undergoing a major deployment change and I don't have the test environment as before, so I'm not able to help.

EchterAgo commented 2 years ago

I have gotten SMB-Direct to work between a Ubuntu 22.04 (Prerelease) server and a Windows 10 Pro for Workstations 21H2 client, both using Mellanox ConnectX-3 dual port 40GbE cards. I had to disable the rdma_frwr_is_supported check in ksmbd_rdma_capable_netdev to get the server to advertise RDMA.

In Get-SmbMultichannelConnection the Client RSS Capable stays False but Client RDMA Capable is True. I suspect this is something on the client because the server includes the RSS flag in FSCTL_QUERY_NETWORK_INTERFACE.

I also noticed that VLAN interfaces do not get advertised as supporting RDMA in FSCTL_QUERY_NETWORK_INTERFACE. Due to my network setup I have the main interfaces of my cards in a bridge with only the bridge interface having assigned an address. I have a separate VLAN interface with an address for RDMA. This works when testing with RDMA tools like rping or qperf. When I check the ksmbd response in FSCTL_QUERY_NETWORK_INTERFACE the main interface gets (correctly?) advertised as supporting RDMA without an address but the VLAN interface gets advertised as not supporting RDMA but with an address. I managed to fix this in fsctl_query_iface_info_ioctl by checking whether the interface is a VLAN device and getting the real device before determining capabilities, see https://github.com/EchterAgo/ksmbd/commit/01b34681fc2db5c6733cb7be80c35abe6653bf24

namjaejeon commented 2 years ago

@EchterAgo Okay. Can you send the patch to the mailing list(linux-cifs@vger.kernel.org) ?

hclee commented 2 years ago

I have gotten SMB-Direct to work between a Ubuntu 22.04 (Prerelease) server and a Windows 10 Pro for Workstations 21H2 client, both using Mellanox ConnectX-3 dual port 40GbE cards. I had to disable the rdma_frwr_is_supported check in ksmbd_rdma_capable_netdev to get the server to advertise RDMA.

Which device driver do you use? from the mainline kernel or Mellanox.

EchterAgo commented 2 years ago

@namjaejeon I can do that, but I wonder if this is actually the correct place to fix this?

@hclee On Ubuntu I used the mlx4 mainline 5.15 driver, on Windows I used the Mellanox driver MLNX_VPI_WinOF-5_50_54000_All_win2019_x64.exe. I haven't tried Microsoft's included driver yet.

namjaejeon commented 2 years ago

I can do that, but I wonder if this is actually the correct place to fix this?

As you know, ksmbd have been merged into linux mainline. Here is out of tree ksmbd version for old users. Normally I am porting the patches from ksmbd of linux mainline to here.

You can download source code from https://kernel.org/ . and create the patch on it.

namjaejeon commented 2 years ago

@hcbwiz I applied smb-direct multiple Buffer descriptors to ksmbd now. Could you please check performance gain than before ? and I request that you check it on intel X777 as well as Mellanox NICs. Thanks!

hcbwiz commented 2 years ago

@hcbwiz I applied smb-direct multiple Buffer descriptors to ksmbd now. Could you please check performance gain than before ? and I request that you check it on intel X777 as well as Mellanox NICs. Thanks!

Great! I will test it.

hcbwiz commented 2 years ago

@namjaejeon

I used frametest tool: frametest.exe -w 4k -t 20 -n 2000

NIC: Mellanox ConnectX-5
backend storage:  nvme raid0 with XFS.

without smb-direct multiple Buffer descriptors: 圖片

with smb-direct multiple Buffer descriptors: 圖片

with multi-channel (16 connections): 圖片

hcbwiz commented 2 years ago

I request that you check it on intel X777 as well as Mellanox NICs. Thanks!

@namjaejeon

I suppose you mean intel X722 (iwarp device). This device only supports max_send_sges = 3. With smb-direct multiple Buffer descriptors, I still need to modify the code like #533, right?

Besides, I used linux OS to test intel x722. and I might get something wrong before. It seems that the smb direct function in linux cifs client cannot work with intel x722 due to the same reason: max_send_sges = 3.

Currently, I have no windows server with intel X722 NIC to test it.

namjaejeon commented 2 years ago

@hcbwiz Thanks for your check!!, Can I check Data rate in averaged details ?

  1. w/o multiple descriptor : 1908MB/s
  2. w multiple descriptor : 4586MB/s

It seems to have improved by more than 2x. Do you think with this implementation ksmbd smbdirect is pulling out almost RDMA max HW speed of your NIC?

hcbwiz commented 2 years ago

It seems to have improved by more than 2x. Do you think with this implementation ksmbd smbdirect is pulling out almost RDMA max HW speed of your NIC?

ksmbd uses per-thread-per-connection mechanism + synchronous vfs api, and windows only 2 RDMA connections per ip address over SMB direct.

I did a test.

diff --git a/vfs.c b/vfs.c
index b3f734e..5ad4222 100644
--- a/vfs.c
+++ b/vfs.c
@@ -560,7 +560,8 @@ int ksmbd_vfs_write(struct ksmbd_work *work, struct ksmbd_file *fp,
        /* Do we need to break any of a levelII oplock? */
        smb_break_all_levII_oplock(work, fp, 1);

-       err = kernel_write(filp, buf, count, pos);
+       //err = kernel_write(filp, buf, count, pos);
+       err = count;
        if (err < 0) {
                ksmbd_debug(VFS, "smb write failed, err = %d\n", err);
                goto out;

圖片 Note: I did the test serveral times, but the results were not stable. The results I observed were 13x fps ~ 16x fps. maybe the extra memory allocation and memcpy between "rdma channel" and data buffer might impact the throughput.

Maybe using asynchronous API for file IO to leverage multi-cores is an alternative. BTW, linux kernel asynchronous API for file IO also supports by-pass vfs page cache.

hclee commented 2 years ago

@hcbwiz Really thank you! If you have time, Could you test it again after applying the following patch? The patch will increase the number of write requests which Windows client can send per connection.

diff --git a/transport_rdma.c b/transport_rdma.c
index b5e453a..91d556d 100644
--- a/transport_rdma.c
+++ b/transport_rdma.c
@@ -1718,7 +1718,7 @@ static int smb_direct_init_params(struct smb_direct_transport *t,
         */
        t->max_rdma_rw_size = smb_direct_max_read_write_size;
        t->pages_per_rw_credit = smb_direct_get_max_fr_pages(t);
-       t->max_rw_credits = DIV_ROUND_UP(t->max_rdma_rw_size,
+       t->max_rw_credits = DIV_ROUND_UP(t->max_rdma_rw_size * 2,
                                         (t->pages_per_rw_credit - 1) *
                                         PAGE_SIZE)
hcbwiz commented 2 years ago

@hclee

After applying this patch, it improves more 1x% throughput. 圖片

Note: I did these test several times, the average results were 10x fps ~ 11x fps

hcbwiz commented 2 years ago

@hclee

with multiple descriptor: I tried to test read throughput, but I often got "cannot read file" like that: 圖片

I dump the rdma debug info: debug.txt

hclee commented 2 years ago

@hcbwiz

@hclee

with multiple descriptor: I tried to test read throughput, but I often got "cannot read file" like that: 圖片

I dump the rdma debug info: debug.txt

Okay, I will check this. I can't find any clues from the file, "debug.txt".

consp commented 2 years ago

Tried it again, so some additional results. This fix doubles the write speed in some tests, good old printf delays:

diff --git a/smb2pdu.c b/smb2pdu.c
index 1caf0d4..7fa5b82 100644
--- a/smb2pdu.c
+++ b/smb2pdu.c
@@ -6596,7 +6596,7 @@ int smb2_write(struct ksmbd_work *work)
                /* read data from the client using rdma channel, and
                 * write the data.
                 */
-               pr_err("filename %pd, offset %lld, len %zu\n",
+               ksmbd_debug(RDMA, "filename %pd, offset %lld, len %zu\n",
                            fp->filp->f_path.dentry, offset, length);
                nbytes = smb2_write_rdma_channel(work, req, fp, offset,
                                                 le32_to_cpu(req->RemainingBytes),

It is not present in kernel code, just @namjaejeon original repository and this one.

Also had to update the mlx winof2 drivers since I forgot I updated windows ... took me a while to figure that out.

Applied de *2 patch you mentioned above, doesn't make any noticeable difference.

Tested on kernel 5.17 with this repo as ksmbd source, patched ConnectX4 driver with a zfs array which does about 3+GB/s on synchronous 4k transfers on the host synchronous both read and write (read is a bit higher but close enough).

Rdma is quite good, near native (unbuffered) speed about 20% above RSS with less load (as expected). It's a bit faster than the mainline kernel module.

The slowdown is all write are synchronous which makes all writes uncombined, as zfs does not use any buffering like it normally would, though I'm not sure this is due to the way it writes with cifs or due to how ksmbd does things, my knowledge is limited here. It's not sync/async: When putting the zfs dataset in sync=none the results were almost exactly the same and I could see no disc access. In that case I would expect near double speed (around 6GB/s). It's also noticeable that random 4k access is slower now, while throughput is higher.

Crystal results, I've tried the frametest and they were similar (top is kernel 5.17 mainline version, bottom is this repo with *2 patch and write printf removed), the 4/32GB size makes no difference just duration: image

@hclee

with multiple descriptor: I tried to test read throughput, but I often got "cannot read file" like that: 圖片

I also have exactly the same issue with the frametest.exe read test. For verification it wasn't a windows problem: This does not happen on smbd, it only happens with ksmbd. Another thing I noticed: ksmbd does not report the 'execute' acl (chmod +x) restrictions and adds the archive flag everywhere for what I can see, mounting the same directory in smbd gives different permissions. The directories use the same smb.conf part. Maybe this is related.

I did notice windows dropping the second RDMA connection sometimes which drops the speed by about 40%. There were no error messages in the kernel logs about this.

All in all it looks good so far (nice work!), I've also tried copying some directories with many small files which is now limited by the disks they are coming from and not the other way around.

namjaejeon commented 2 years ago

@consp Oops, My mistake. Removed error print in smb2_write().

hcbwiz commented 2 years ago

@consp Oops, My mistake. Removed error print in smb2_write().

@hclee It showed zero length on testing write cases over RDMA

圖片

Maybe the error of read case is related to it.

namjaejeon commented 2 years ago

@hcbwiz That shouldn't be a problem. buffer length of RDMA is used as req->RemainingBytes.

namjaejeon commented 2 years ago

@hcbwiz Could you please share how to run frame test to reproduce read failure issue ? command and options for each read and write.

hcbwiz commented 2 years ago

you can get frametest tool from here: https://support.dvsus.com/hc/en-us/articles/212925466-How-to-use-frametest

step 1: generate data first frametest -w 4k -t 20 -n 2000 "directory"

step 2: do read test frametest -r 4k -t 20 -n 2000 "directory"

-w size  Perfrom write test with 'size' KB frames or "sd", "hd", "2k", "4k"
-r       Perform read test using existing frames (default)
-t num   Use multithreading I/O, with 'num' threads at one time
-n num   Number of frames to read or write (default = 1800)

I did more tests without "RDMA read/write function". It sometimes got the same error , but not very often.

I will try to narrow down the issue.

namjaejeon commented 2 years ago

@hcbwiz Thank you! I will try to fix it:)

hcbwiz commented 2 years ago

I used the latest master branch with TCP still get the same error sometimes.

Then I figured out an issues. some files couldn't be listed in the windows client. Note: I tried to restart ksmbd server (even re-loaded ksmbd.ko), the issue still remain. image

namjaejeon commented 2 years ago

Ah, Good info! I execute framtest 4 time, but can't reproduce it yet. Can you tell me what is local filesystem in your system ?

hcbwiz commented 2 years ago

Ah, Good info! I execute framtest 4 time, but can't reproduce it yet. Can you tell me what is local filesystem in your system ?

It is a XFS over RAID0

/dev/md0 on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=9216,noquota)
Personalities : [raid0]
md0 : active raid0 nvme8n1[8] nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
      16877177856 blocks super 1.2 512k chunks
/dev/md0                  16T   98G   16T   1% /mnt
namjaejeon commented 2 years ago

Kernel version is 5.18 or 5.17 ?

hcbwiz commented 2 years ago

Kernel version is 5.18 or 5.17 ?

5.16.3

namjaejeon commented 2 years ago

@hcbwiz Could you please test it with ext4 local fs ?

consp commented 2 years ago

My test so far: kernel 5.17.5, Cx4 card inbox driver

FS of share: tmpfs (tmpfs) -> same error zfs (both 8*ssd sas, 2x4 disk zraid)-> same error ext4 (single disk) -> same error

On ext4 it was a lot more errors though with errors coinciding with: May 12 09:14:00 NAS kernel: [217540.665675] ksmbd: hash value diff There were 4 of those errors, with 2 reported in the frametest application

May 12 09:32:17 NAS kernel: [  142.536591] ksmbd: smb_direct: Recv completed. status='success (0)', opcode=128
May 12 09:32:17 NAS kernel: [  142.536629] ksmbd: smb_direct: returning rfc1002 length 393
May 12 09:32:17 NAS kernel: [  142.536638] ksmbd: RFC1002 header 393 bytes
May 12 09:32:17 NAS kernel: [  142.536642] ksmbd: smb_direct: returning to thread data_read=393 reassembly_data_length=0 first_entry_offset=0
May 12 09:32:17 NAS kernel: [  142.536653] ksmbd: smb_direct: wait_event on more data
May 12 09:32:17 NAS kernel: [  142.536999] ksmbd: SMB2 data length 24 offset 152
May 12 09:32:17 NAS kernel: [  142.537001] ksmbd: SMB2 len 176
May 12 09:32:17 NAS kernel: [  142.537005] ksmbd: converted name = frametest.exe
May 12 09:32:17 NAS kernel: [  142.537007] ksmbd: get query maximal access context
May 12 09:32:17 NAS kernel: [  142.537022] ksmbd: check permission using windows acl
May 12 09:32:17 NAS kernel: [  142.537041] ksmbd: hash value diff
May 12 09:32:17 NAS kernel: [  142.537054] ksmbd: credits: requested[1] granted[1] total_granted[73]
May 12 09:32:17 NAS kernel: [  142.537062] ksmbd: got SMB2 chained command
May 12 09:32:17 NAS kernel: [  142.537062] ksmbd: Compound req new_len = 184 rcv off = 176 rsp off = 184
May 12 09:32:17 NAS kernel: [  142.537064] ksmbd: SMB2 data length 0 offset 0
May 12 09:32:17 NAS kernel: [  142.537065] ksmbd: SMB2 len 105
May 12 09:32:17 NAS kernel: [  142.537066] ksmbd: GOT query info request
May 12 09:32:17 NAS kernel: [  142.537066] ksmbd: GOT SMB2_O_INFO_FILESYSTEM
May 12 09:32:17 NAS kernel: [  142.537073] ksmbd: credits: requested[1] granted[1] total_granted[73]
May 12 09:32:17 NAS kernel: [  142.537074] ksmbd: got SMB2 chained command
May 12 09:32:17 NAS kernel: [  142.537074] ksmbd: Compound req new_len = 104 rcv off = 288 rsp off = 288

I tried some more but got ksmbd into an endless "Stop session handler" loop when changing shares (ksmbd.control -s/ksmbd.mountd) (yes, I've set the rdma active connection number to 16 in windows): image

namjaejeon commented 2 years ago

@consp Please share your smb.conf file. and it is true that same error happen on TCP(not RDMA) and ext4 ?

namjaejeon commented 2 years ago

@consp You can paste smb.conf content here.

consp commented 2 years ago

Some parts are for smbd/ksmbd switching and ignored by ksmbd, all tests had the came share config (except for the name/dir), all had 777 permissions from root:

[global]
workgroup = local
server string = NAS
log level = 0
log file = /var/log/samba/klog.%m
max log size = 1000
logging = syslog
guest account = nobody
create mask = 0777
directory mask = 0777
map to guest = Bad User
follow symlinks = yes
server multi channel support = yes
server min version = SMB3_11
bind interfaces only = yes
interfaces = ens15

[tmp]
    path = /tmp
    guest ok = yes
    guest only = no
    read only = no
    browseable = yes
    inherit acls = no
    inherit permissions = no
    ea support = no
    store dos attributes = no
    printable = no
    create mask = 0664
    force create mode = 0664
    directory mask = 0775
    force directory mode = 0775
    hide special files = yes
    follow symlinks = yes
    hide dot files = yes
    valid users = 
    invalid users = 
    read list = 
    write list =

Can't test it with RSS now, no longer at home. Will do later today.

hcbwiz commented 2 years ago

@hcbwiz Could you please test it with ext4 local fs ?

In my local system (xfs), there are 2000 files: frame000000.tst ~ frame001999.tst in the target directory.

I'm trying to debug it, but I'm not familiar with the "iterate dir" APIs.

In smb2_query_dir():

  d_info.out_buf_len =
                smb2_calc_max_out_buf_len(work, 8,
                                          le32_to_cpu(req->OutputBufferLength));

the output buffer length is 65536 bytes.

  process_query_dir_entries() => priv->d_info->num_entry = 480

In the windows client, there are 478 files:

image

hcbwiz commented 2 years ago

@consp

Do you use one windows client? if yes, how do you set 16 rdma connections? use additional alias IP address for ens15?

consp commented 2 years ago

@consp

Do you use one windows client? if yes, how do you set 16 rdma connections? use additional alias IP address for ens15?

https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/role/file-server/

For future reference: increase (or create) HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\ConnectionCountPerRdmaNetworkInterface DWORD in registry to 16, above 4 you will get diminishing returns in throughput, though it helps with many files.

namjaejeon commented 2 years ago

In the windows client, there are 478 files:

@hcbwiz Ah, stranged... Please keep problem situation. I will share ksmbd source that included debug print log to you. First, Could you please share wireshark dump when doing ls command on windows client ?

namjaejeon commented 2 years ago

@hcbwiz Please share print log while doing ls command.

git clone --branch=ksmbd-debug https://github.com/namjaejeon/ksmbd

hcbwiz commented 2 years ago

In the windows client, there are 478 files:

@hcbwiz Ah, stranged... Please keep problem situation. I will share ksmbd source that included debug print log to you. First, Could you please share wireshark dump when doing ls command on windows client ? @namjaejeon , here is the wireshark dump ksmbd.zip

hcbwiz commented 2 years ago

@hcbwiz Please share print log while doing ls command.

git clone --branch=ksmbd-debug https://github.com/namjaejeon/ksmbd Here is the log: ksmbd_log.txt

namjaejeon commented 2 years ago

@hcbwiz Thanks for your help!, Please check if problem is improved or not with the below branch. If problem is still there, please share the log.

git clone --branch=ksmbd-debug https://github.com/namjaejeon/ksmbd

hcbwiz commented 2 years ago

@hcbwiz Thanks for your help!, Please check if problem is improved or not with the below branch. If problem is still there, please share the log.

git clone --branch=ksmbd-debug https://github.com/namjaejeon/ksmbd

@namjaejeon The good news is your patch fix the issue for TCP. frametest read test can work without problem.

when testing RDMA ,I can list all files in the windows client. But it still get the original problem. I will continue to narrow down it.