ZaidQureshi / bam

BSD 2-Clause "Simplified" License
128 stars 32 forks source link

[ioctl_map] Page mapping kernel request failed (ptr=0x7fcef2820000, n_pages=1): Invalid argument, when running nvm-block-bench #42

Open AkhilSrinivasSolidigm opened 3 days ago

AkhilSrinivasSolidigm commented 3 days ago

Describe the bug

[ioctl_map] Page mapping kernel request failed (ptr=0x7fcef2820000, n_pages=1): Invalid argument, when running nvm-block-bench when running sudo ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=false

To Reproduce After successfully following all steps to install BaM with no errors, loading drive with make load

root@xxxx:~/GIDS/bam/build/module# ls -l /dev | grep libnvm crwxrwxrwx 1 root root 506, 0 Oct 22 10:21 libnvm0

Drive loaded successfully Command: dmesg | tail -10 Output: [ 456.310326] Adding controller device: c8:00.0 [ 456.310550] Character device /dev/libnvm0 created (506.0) [ 456.310610] libnvm helper loaded

Now running sudo ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=1 --pages=262144 --queue_depth=1024 --page_size=512 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=false

Gives [ioctl_map] Page mapping kernel request failed (ptr=0x7fcef2820000, n_pages=1): Invalid argument Unexpected error: Failed to map device memory: Invalid argument

dmesg after failure: Oct 22 10:31:15 kernel: [ 1028.550861] Unknown ioctl command from process 2389: 1075347458 Oct 22 10:31:54 kernel: [ 1067.493923] Unknown ioctl command from process 2400: 1075347458

Expected behavior To run successfully without failures.

Machine Setup (please complete the following information):

Additional context

Ensured nvcc and cuda are same version : root@xxxx:~/GIDS/bam/build# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

uname -a :

Linux 5.8.0-050800-generic #202008022230 SMP Sun Aug 2 22:33:21 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

msharmavikram commented 3 days ago

Is this a workstation or desktop system? If yes, this may be bit more tricky.

I wonder if GPUdirect is supported in the platform. If it's not, BaM won't work.

AkhilSrinivasSolidigm commented 3 days ago

This is a workstation and have previously run GPUDirect but on a different OS. Is GDS and MLNX with NVMe a requirement for BaM support?

msharmavikram commented 2 days ago

No. GDS will work without GPUDirect in compact mode. BaM has no notion of compact mode.

AkhilSrinivasSolidigm commented 2 days ago

Is there any other logs or setup information I can provide to debug this issue? I've tried RHEL 9.2 as well with the same issue. Different drives too. Do you have any recommendations or feedback regarding the setup?

msharmavikram commented 2 days ago

Unsure as this is entering a specific system-specific feature and something that we never accounted for.

We only know what it takes based on what we described here - https://github.com/ZaidQureshi/bam?tab=readme-ov-file#hardwaresystem-requirements

We cannot support all the universal system configurations. Your team member Wayne already has brought up BaM and would recommend working closely with him.