cifsd-team / ksmbd

ksmbd kernel server(SMB/CIFS server)
151 stars 23 forks source link

SMB Direct with a Windows client #600

Closed farnoy closed 10 months ago

farnoy commented 10 months ago

I'm using Connectx 3's between a Linux ksmbd host and a Windows 10 client. Uploading large files to a share works well, with typical dmesg output:

[ 7244.331580] ksmbd: smb_direct: returning to thread data_read=88 reassembly_data_length=0 first_entry_offset=0
[ 7244.331587] ksmbd: smb_direct: Sending smb (RDMA): smb_len=124
[ 7244.331589] ksmbd: smb_direct: credits_requested=255 credits_granted=0 data_offset=24 data_length=124 remaining_data_length=0
[ 7244.331592] ksmbd: smb_direct: wait_event on more data
[ 7244.331597] ksmbd: smb_direct: Send completed. status='success (0)', opcode=0

But when I try to download something from the share, it shows up as this kind of output in dmesg:

[ 7132.582439] ksmbd: smb_direct: read/write error. opcode = 0, status = WR flushed(5)
[ 7132.582447] ksmbd: smb_direct: Send completed. status='WR flushed (5)', opcode=-29703
[ 7132.582450] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=-29703
[ 7132.582725] ksmbd: Failed to send message: -107
[ 7132.582727] ksmbd: Failed to send message: -107
[ 7132.582737] ksmbd: Failed to send message: -107
[ 7132.582743] ksmbd: Failed to send message: -107
[ 7132.582747] ksmbd: Failed to send message: -107
[ 7132.582751] ksmbd: Failed to send message: -107
[ 7132.582755] ksmbd: Failed to send message: -107
[ 7132.582758] ksmbd: Failed to send message: -107
[ 7132.582762] ksmbd: Failed to send message: -107
[ 7132.582766] ksmbd: Failed to send message: -107
[ 7132.582770] ksmbd: Failed to send message: -107
[ 7132.582773] ksmbd: Failed to send message: -107
[ 7132.583417] ksmbd: smb_direct: RDMA CM event. cm_id=00000000d8cfe30e event=disconnected (10)

And windows reports this as:

A network connection was disconnected.

Instance name: \Device\LanmanRedirector
Server name: \192.168.100.1
Server address: 192.168.100.1:445
Connection type: Rdma
InterfaceId: 14

Guidance:
This indicates that the client's connection to the server was disconnected.

Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects and poor performance.

I believe I've set MTUs properly between the hosts, and RDMA does seem to work in at least one direction - all the symptoms confirm it's working, like perfmon counters. I'm new to RDMA so I totally could have made a mistake setting something up. One thing that stood out to me is the disparity between send & recv sizes, but I don't know if it's at all relevant:

[ 7132.599687] ksmbd: smb_direct: MinVersion: 256, MaxVersion: 256, CreditRequested: 255, MaxSendSize: 1364, MaxRecvSize: 8192, MaxFragmentedSize: 1048576
farnoy commented 10 months ago

I captured traffic on the network, 192.168.100.1 is the host with ksmbd. It is initiating the disconnect which is causing a couple-seconds-long stall that can be seen in the file explorer copy stalling. After that, the client sends out a Tree Disconnect followed by more Read Requests. Given enough time, the file will get transferred, and as far as I can tell, it does not get downgraded to traditional IP traffic, it keeps attempting RDMA until it breaks.

Something is off with the timestamps in this capture, I don't think it's possible for the Tree Connect Response to arrive in the same microsecond as the request was sent.

image

namjaejeon commented 10 months ago

What kernel version did you use ? When I am testing upload/download large files, there is no problem. And How big is a large file? I have tested it with a 10GB file.

farnoy commented 10 months ago

6.4.12, I've tested uploads with small and large files 5MB-20GB and they work consistently well. For downloads, small files <10MB seem to complete instantly, within the initial burst and before the connection breaks off. Anything larger, like a 200MB file I tested on, will disconnect in this way and the download will stall until it gets reconnected & continued from there.

namjaejeon commented 10 months ago
smbd max io size (G)
              Maximum read/write size of SMB-Direct.  Number suffixes are allowed.

              Default: smbd max io size = 8MB

Can you check after decreasing smbd io size with 1M or 512K ?

[global]
smbd max io size = 512K

or

smbd max io size = 1M
farnoy commented 10 months ago

I tried 512K but it did not change anything. Still able to upload just fine, downloads get interrupted and stalled.

But I did try something else. My ConnectX3 adapter on the Windows client side is bottlenecked by being plugged into a PCIe x4 port. There is a bottleneck of ~26Gbps in my PCI subsystem before the 40GbE link can be saturated. So to test if it has an impact, I downgraded my link to 10GbE and both downloads & uploads work reliably.

Could the PCI bottleneck be affecting the behavior in some way?

farnoy commented 10 months ago

In the meantime, I'll try to add a second link between the nodes, run them both at 10GbE and see if I can get multichannel & RDMA working to get to 2GB/s transfers at least

namjaejeon commented 10 months ago

Could the PCI bottleneck be affecting the behavior in some way?

I don't know how to handle this in smb protocol(i.e. ksmbd). It seems linux RDMA driver should handle it. Basically, I think we need to configure the HW setup according to the specifications...

farnoy commented 10 months ago

I'll look into it, I'm probably missing some form of congestion control on this RoCEv1 network. It makes sense that the issue shows up when a faster producer is sending too much for the consumer to handle but the other way around works fine.

Thanks for the help so far!

farnoy commented 10 months ago

OK, I believe I fixed it by configuring global flow control on both sides. That is: ethtool -A $dev rx on tx on on Linux and this property on Windows, which is probably Mellanox-specific: Get-NetAdapterAdvancedProperty -Name "Ethernet 7" -RegistryKeyword "*FlowControl".

I have no idea how this works because I don't see any pause frames or other congestion notifications but maybe this happens deeper in the IB stack.

Thanks again, ksmbd seems to work flawlessly and all the issues I've had so far were caused by something else.