Open AkhilSrinivasSolidigm opened 3 days ago
Is this a workstation or desktop system? If yes, this may be bit more tricky.
I wonder if GPUdirect is supported in the platform. If it's not, BaM won't work.
This is a workstation and have previously run GPUDirect but on a different OS. Is GDS and MLNX with NVMe a requirement for BaM support?
No. GDS will work without GPUDirect in compact mode. BaM has no notion of compact mode.
Is there any other logs or setup information I can provide to debug this issue? I've tried RHEL 9.2 as well with the same issue. Different drives too. Do you have any recommendations or feedback regarding the setup?
Unsure as this is entering a specific system-specific feature and something that we never accounted for.
We only know what it takes based on what we described here - https://github.com/ZaidQureshi/bam?tab=readme-ov-file#hardwaresystem-requirements
We cannot support all the universal system configurations. Your team member Wayne already has brought up BaM and would recommend working closely with him.
Describe the bug
[ioctl_map] Page mapping kernel request failed (ptr=0x7fcef2820000, n_pages=1): Invalid argument, when running nvm-block-bench when running sudo ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=false
To Reproduce After successfully following all steps to install BaM with no errors, loading drive with make load
root@xxxx:~/GIDS/bam/build/module# ls -l /dev | grep libnvm crwxrwxrwx 1 root root 506, 0 Oct 22 10:21 libnvm0
Drive loaded successfully Command: dmesg | tail -10 Output: [ 456.310326] Adding controller device: c8:00.0 [ 456.310550] Character device /dev/libnvm0 created (506.0) [ 456.310610] libnvm helper loaded
Now running sudo ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=false
Gives [ioctl_map] Page mapping kernel request failed (ptr=0x7fcef2820000, n_pages=1): Invalid argument Unexpected error: Failed to map device memory: Invalid argument
dmesg after failure: Oct 22 10:31:15 kernel: [ 1028.550861] Unknown ioctl command from process 2389: 1075347458 Oct 22 10:31:54 kernel: [ 1067.493923] Unknown ioctl command from process 2400: 1075347458
Expected behavior To run successfully without failures.
Machine Setup (please complete the following information):
Additional context
Ensured nvcc and cuda are same version : root@xxxx:~/GIDS/bam/build# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0
uname -a :
Linux 5.8.0-050800-generic #202008022230 SMP Sun Aug 2 22:33:21 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux