Open consp opened 11 months ago
Did some digging but I'm now in over my head:
Error -EINVAL originates here:
static int smb_direct_rdma_xmit(struct smb_direct_transport *t,
void *buf, int buf_len,
struct smb2_buffer_desc_v1 *desc,
unsigned int desc_len,
bool is_read)
{
...
some code
....
for (i = 0; i < desc_len / sizeof(*desc); i++) {
desc_buf_len = le32_to_cpu(desc[i].length);
credits_needed += calc_rw_credits(t, desc_buf, desc_buf_len);
desc_buf += desc_buf_len;
total_length += desc_buf_len;
if (desc_buf_len == 0 || total_length > buf_len ||
total_length > t->max_rdma_rw_size)
{
ksmbd_debug(RDMA, "TEST RDMA %08X %08X %08X %08X", desc_buf_len, total_length, buf_len, t->max_rdma_rw_size);
return -EINVAL;
}
}
Logging is my addition.
Is where it errors, seems total_length > buf_len
triggers the exit. (hex since I do now know the value representation)
ksmbd: smb_direct: TEST RDMA 00020000 00020000 00000400 01000000
Thanks for your report! I will try to reproduce it.
I tried to understand it a bit and this seems to work. I'm not completely sure it is correct but transfering a large sum of files with many different small filesizes seems to work now, it looks like the read descriptor being set to 128k is the issue: (update to match pr)
diff --git a/smb2pdu.c b/smb2pdu.c
index 591d899..e9749fc 100644
--- a/smb2pdu.c
+++ b/smb2pdu.c
@@ -6792,7 +6792,13 @@ static ssize_t smb2_read_rdma_channel(struct ksmbd_work *work,
size_t length)
{
int err;
+ struct smb2_buffer_desc_v1 *desc;
+ /* set descriptor lenght to nbytes if less than request size */
+ if (length < req->Length) {
+ desc = (struct smb2_buffer_desc_v1 *)((char *)req + le16_to_cpu(req->ReadChannelInfoOffset));
+ desc->length = cpu_to_le32(length);
+ }
err = ksmbd_conn_rdma_write(work->conn, data_buf, length,
(struct smb2_buffer_desc_v1 *)
((char *)req + le16_to_cpu(req->ReadChannelInfoOffset)),
Tested with (all seems to do fine):
#! /bin/bash
for n in {1..1000}; do
dd if=/dev/zero of=file$( printf %03d "$n" ).bin bs=1024 count=$(( 128 + RANDOM / 1024 ))
done
Result master branch:
Result fix I made later (better place to do it):
@consp Hm.. I can not reproduce it with windows 10 workstation. In my case, multichannel -> RDMA is changed in the middle of copying while a lot of data is being copied. It seems that Windows cannot transfer such small files with RDMA. How did you get RDMA to transfer even for small files?
@consp I have added the patch for this issue? Can you review and verify it(https://github.com/namjaejeon/ksmbd/commit/e255ebe60db934832d034e06bd165d6c74cca083) ?
git clone https://github.com/namjaejeon/ksmbd --branch=rdma
@consp ping ?
Hi, I have the same issue. I can't clone your branch though, it says it doesn't exist. Did you delete it, or did you merge the fix into the master branch?
I don't know if it's related but ksmbd
is also spamming these lines into the kernel log non-stop:
...
[27547.422677] ksmbd: Invalid Buffer Size Requested
[27547.429593] ksmbd: Invalid Buffer Size Requested
[27547.435637] ksmbd: Invalid Buffer Size Requested
...
I'm using Windows 10 and Linux 6.6.0 ksmbd module over RDMA (ConnectX-3)
I've tried with the master
branch of https://github.com/namjaejeon/ksmbd, I still get these errors. Copying it from Windows onto Linux/ksmbd worked, but moving or renaming the file still produces "file too large" errors:
@nitroxis Can you check the below branch if problem is still happening ?
git clone https://github.com/consp/ksmbd --branch=reduce-descriptor-length
@nitroxis ping?
Hi, sorry for the late response. I'm unable to compile that branch against kernel 6.6.0 due to various errors:
make -C /lib/modules/6.6.0-arch1-1-custom/build M=/home/nitroxis/src/ksmbd2 modules
make[1]: Entering directory '/usr/lib/modules/6.6.0-arch1-1-custom/build'
CC [M] /home/nitroxis/src/ksmbd2/vfs.o
/home/nitroxis/src/ksmbd2/vfs.c: In function ‘ksmbd_vfs_fill_dentry_attrs’:
/home/nitroxis/src/ksmbd2/vfs.c:3000:33: warning: passing argument 2 of ‘generic_fillattr’ makes integer from pointer without a cast [-Wint-conversion]
3000 | generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
| ^~~~~~~~~~~~~~~
| |
| struct inode *
In file included from /home/nitroxis/src/ksmbd2/vfs.c:8:
./include/linux/fs.h:3016:43: note: expected ‘u32’ {aka ‘unsigned int’} but argument is of type ‘struct inode *’
3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
| ^~~
/home/nitroxis/src/ksmbd2/vfs.c:3000:61: error: passing argument 3 of ‘generic_fillattr’ from incompatible pointer type [-Werror=incompatible-pointer-types]
3000 | generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
| ~~~~~~~~~~~^~~~~~~
| |
| struct kstat *
./include/linux/fs.h:3016:48: note: expected ‘struct inode *’ but argument is of type ‘struct kstat *’
3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
| ^~~~~~~~~~~~~~
/home/nitroxis/src/ksmbd2/vfs.c:3000:9: error: too few arguments to function ‘generic_fillattr’
3000 | generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
| ^~~~~~~~~~~~~~~~
./include/linux/fs.h:3016:6: note: declared here
3016 | void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
| ^~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:243: /home/nitroxis/src/ksmbd2/vfs.o] Error 1
make[2]: *** [/usr/lib/modules/6.6.0-arch1-1-custom/build/Makefile:1913: /home/nitroxis/src/ksmbd2] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/usr/lib/modules/6.6.0-arch1-1-custom/build'
make: *** [Makefile:47: all] Error 2
Nevermind, I merged with cifsd-team/ksmbd
master, that compiled successfully. However, loading the new module still leads to the same Windows error message as before.
I am not sure if it's related to this issue, but opening files with some programs also leads to errors. For example, opening a small 1kB file in HxD
gives me the following error:
Opening text files with VSCode simply shows an empty file. I guess it too encounters an error somewhere, but doesn't propagate it properly in the UI.
Playing a video file with
mpv
or opening an image with XnView
on the other hand works fine, so I'm guessing it's only a specific function/request that fails, which is only triggered by some programs but not by others.
This also happens on master
branch, not just the reduce-descriptor-length
branch
Now VSCode actually displayed the following error:
@nitroxis Can you give me wireshark dump(or tcpdump) on problem situation ? Acutally I didn't reproduce it with windows client before. Once, want to check packets that you will give on your reproduction environment.
@nitroxis please give also me debug prints after enabling RDMA log(sudo ksmbd.control -d rdma).
It has been a while since I used rdma since I switched to windows 11, trying to get it working again.
Issue
See title, files with offsets other than blocks of 128k will not transfer properly.
Setup
Version of kmod: git master Version of tools: git master kernel: 6.4.7-xanmod with patch for mellanox adapter to be recognised as rdma device Changes from default: 8 rdma connection instead of 2 Adapter: mellanox ConnectX-4 EN 100gbit, winof2 on windows, inbox on linux Client OS: Windows 11 for workstations Server OS: Debian bookworm
Expected behaviour
Files copy normally
To reproduce
copy files from ksmbd location (linux, zfs) to local storage (windows, ntfs)
Result
Also tried from ext4 partition: same problem to avoid zfs being the issue.
Afterwards I created a 256k file since my testfile of 8gb copied fine:
Which worked. (not included in logs)
It look like there is some issue with non 128k sized files (offset issue?). The part which stands out:
Attached are the logs (
ksmbd.control -d all
), somewhat sanitizedAdditional notes: Sometimes windows acts like it is transfering the file twice. result.log
Note: verified it's actually RDMA, RSS works fine.