NVIDIA / gds-nvidia-fs

NVIDIA GPUDirect Storage Driver
Other
175 stars 29 forks source link

NVME unsopported #4

Open dearsxx0918 opened 2 years ago

dearsxx0918 commented 2 years ago

Hi, I'm setup GDS on my machine, but get the below output: NVMe : Unsupported NVMeOF : Unsupported SCSI : Unsupported ScaleFlux CSD : Unsupported NVMesh : Unsupported DDN EXAScaler : Unsupported IBM Spectrum Scale : Unsupported NFS : Unsupported WekaFS : Unsupported Userspace RDMA : Unsupported

And I can't use GDS for nvme now, can you give me some advice?

dearsxx0918 commented 2 years ago

Ok, I know what's need now, I installed only one NVME card, but GDS need at least RAID 0 which needs at least two NVME card. Another question is, does P100 support GDS? I get something from GDS lib that P100 support GDS. I want to confirm this information.

tunglambk commented 2 years ago

Hi @dearsxx0918 How did you resolve the issue? I have two NVMe cards, but I didn't configure RAID 0 before installing the MELLANOX OFED driver and CUDA. This is my result when checking the GDS.

NVMe : Unsupported NVMeOF : Unsupported SCSI : Unsupported

After that, I saw your post and configured RAID 0 but it still shows Unsupported. Do I need to re-install MELLANOX OFED driver and CUDA?

Thank you

9prady9 commented 2 years ago

Even with RAID 0 configuration, I have been seeing only Unsupported in gdscheck driver configuration output. I tried all the troubleshooting steps in nvidia documentation.

dearsxx0918 commented 1 year ago

Still not know what's going on there.

zhouaoe commented 1 year ago

I meet the same problem。I have read a lot documents, but still have no idea how to resolve it。 - -

Pedrexus commented 1 year ago

Any update?

UTKRISHTPATESARIA commented 1 year ago

Maybe I can help, I'm using only 1 out of 2 NVMe cards for GDS:

Have you mounted the NVMe in data=ordered mode?

https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#mount-local-fs

Pedrexus commented 1 year ago

Yes, it is in data=ordered mode.

I only have one NVME device, where the system and everything else is installed. I did everything as the guide asks and even got a "Supported" flag, yet still it doesn't work.

My conclusion for now is that I need a separate NVME device on RAID0 else it doesn't work, right?

ExtremeViscent commented 1 year ago

Maybe try installing OFED with NVMe options? That helps in my case.

wakaba-best commented 1 year ago

Can my environment be helpful to you?

$ gdscheck -p
 GDS release version: 1.0.1.3
 nvidia_fs version:  2.7 libcufile version: 2.4
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Supported
 NVMeOF             : Supported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Enabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384
 properties.posix_pool_slab_count : 128 64 32
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 =========
 GPU INFO:
 =========
 GPU index 0 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 1 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 2 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 3 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Platform verification succeeded

[1] OS Version

[2] CUDA & GDS Package(deb files):

$ dpkg -l | grep cuda-tools
ii  cuda-tools-11-4                       11.4.1-1                                amd64        CUDA Tools meta-package
$ dpkg -l | grep gds
ii  gds-tools-11-4                        1.0.1.3-1                               amd64        Tools for GPU Direct Storage
$ dpkg -l | grep nvidia-fs
ii  nvidia-fs                             2.7.50-1                                amd64        NVIDIA filesystem for GPUDirect Storage
ii  nvidia-fs-dkms                        2.7.50-1                                amd64        NVIDIA filesystem DKMS package

[3] check1 : Loaded Kernel modules

$ lsmod | grep nvidia_fs
nvidia_fs             245760  0
ib_core               348160  10 rdma_cm,ib_ipoib,nvme_rdma,iw_cm,nvidia_fs,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
$ lsmod | grep nvme_core
nvme_core             110592  3 nvme,nvme_rdma,nvme_fabrics
mlx_compat             65536  16 rdma_cm,ib_ipoib,mlxdevm,nvme,nvme_rdma,iw_cm,nvme_core,auxiliary,nvme_fabrics,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

[4] check2 : IOMMU is disable

$ dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-144-generic root=UUID=318c92d2-8567-4d37-acba-4050de3146d9 ro intel_iommu=off
[    1.355073] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-144-generic root=UUID=318c92d2-8567-4d37-acba-4050de3146d9 ro intel_iommu=off
[    1.355163] DMAR: IOMMU disabled
[    2.328217] DMAR-IR: IOAPIC id 12 under DRHD base  0xc5ffc000 IOMMU 6
[    2.328219] DMAR-IR: IOAPIC id 11 under DRHD base  0xb87fc000 IOMMU 5
[    2.328221] DMAR-IR: IOAPIC id 10 under DRHD base  0xaaffc000 IOMMU 4
[    2.328223] DMAR-IR: IOAPIC id 18 under DRHD base  0xfbffc000 IOMMU 3
[    2.328225] DMAR-IR: IOAPIC id 17 under DRHD base  0xee7fc000 IOMMU 2
[    2.328227] DMAR-IR: IOAPIC id 16 under DRHD base  0xe0ffc000 IOMMU 1
[    2.328229] DMAR-IR: IOAPIC id 15 under DRHD base  0xd37fc000 IOMMU 0
[    2.328232] DMAR-IR: IOAPIC id 8 under DRHD base  0x9d7fc000 IOMMU 7
[    2.328234] DMAR-IR: IOAPIC id 9 under DRHD base  0x9d7fc000 IOMMU 7
[    3.468127] iommu: Default domain type: Translated

[5] check3 : PCIe Topology => GPU and NVMe devices on the same PXL Switch

$ lspci -tv | grep NVMe -A 3 | grep -v Intel
 +-[0000:3a]-+-00.0-[3b-41]----00.0-[3c-41]--+-04.0-[3d]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
 |           |                               +-08.0-[3e]--
 |           |                               +-0c.0-[3f]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-10.0-[40]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               \-14.0-[41]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
--
 |           |                               +-08.0-[1b]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
 |           |                               +-0c.0-[1c]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-10.0-[1d]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               \-14.0-[1e]----00.0  Toshiba Corporation NVMe SSD Controller Cx5

[6] check4 : ACS Control is disable => You get "ACSViol-"

$ sudo lspci -s 1D:00.0 -vvvv | grep -i acs
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

[7] check5: Whether the NVMe drive is formatted as ext4 or xfs, and LVM isn't used?

Pedrexus commented 1 year ago

Hello all,

thank you for your comments and specailly for the checklist. Here are my current results:

[1] OS Version

➜  ~ lsb_release -a              
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:    22.04
Codename:   jammy

➜  ~ uname -r
5.19.0-38-generic

I couldn't find my OFED version, but I'm sure it's installed. If there is a command for it, please tell me.

[2] CUDA & GDS Package(deb files):

➜  ~ dpkg -l | grep cuda-tools
ii  cuda-tools-12-1                            12.1.1-1                                 amd64        CUDA Tools meta-package
➜  ~ dpkg -l | grep gds
ii  gds-tools-12-1                             1.6.1.9-1                                amd64        Tools for GPU Direct Storage
ii  nvidia-gds                                 12.1.1-1                                 amd64        GPU Direct Storage meta-package
ii  nvidia-gds-12-1                            12.1.1-1                                 amd64        GPU Direct Storage 12.1 meta-package
➜  ~ dpkg -l | grep nvidia-fs
ii  nvidia-fs                                  2.15.3-1                                 amd64        NVIDIA filesystem for GPUDirect Storage
ii  nvidia-fs-dkms                             2.15.3-1                                 amd64        NVIDIA filesystem DKMS package

[3] check1 : Loaded Kernel modules

➜  ~ lsmod | grep nvidia_fs
nvidia_fs             262144  0
➜  ~ lsmod | grep nvme_core
nvme_core             147456  7 nvme,nvme_fabrics
mlx_compat             20480  14 rdma_cm,ib_ipoib,mlxdevm,nvme,iw_cm,nvme_core,nvme_fabrics,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

[4] check2 : IOMMU is disabled

➜  ~ sudo dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.19.0-38-generic root=UUID=a8c08d82-da23-4ec4-be78-3fa59ddedb73 ro intel_iommu=off quiet splash split_lock_detect=off vt.handoff=7
[    0.082366] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.19.0-38-generic root=UUID=a8c08d82-da23-4ec4-be78-3fa59ddedb73 ro intel_iommu=off quiet splash split_lock_detect=off vt.handoff=7
[    0.082389] DMAR: IOMMU disabled
[    0.175791] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 0
[    0.429342] iommu: Default domain type: Translated 
[    0.429342] iommu: DMA domain TLB invalidation policy: lazy mode

5] check3 : PCIe Topology => GPU and NVMe devices on the same PXL Switch

lspci -tv | grep Samsung -A 3 | grep -v Intel
           +-1b.4-[04]----00.0  Samsung Electronics Co Ltd Device a80c
           +-1c.0-[05]--
           +-1d.0-[07]--

It seems my NVMe is not on the best slot for GPU comm. Is this a requirement? I will try to fix it soon.

[6] check4 : ACS Control is disable => You get "ACSViol-"

Ok, so this command has an empty result. I`m not sure what to do in this case.

[7] check5: Whether the NVMe drive is formatted as ext4 or xfs, and LVM isn't used?

➜  ~ lsblk -f | grep nvme
nvme0n1                                                                              
├─nvme0n1p1 vfat     FAT32       BA30-5B86                              86.5M     7% /boot/efi
├─nvme0n1p2 ext4     1.0         a8c08d82-da23-4ec4-be78-3fa59ddedb73   57.2G    32% /var/snap/firefox/common/host-hunspell
├─nvme0n1p3 swap     1           8d915ce4-8a76-414d-8416-15a8c82cf34f                [SWAP]
└─nvme0n1p4 ext4     1.0         645675cd-945d-4bd8-a1a1-8e7568294cdf  911.7G    41% /home

My intention is to use nvme0n1p4. I hope this configuration is not the problem.

Thank you very much for the help.

wakaba-best commented 1 year ago

Hi, @Pedrexus

[1] Can I see your environment?

$ gdscheck -p

[2] How to disable ACS(Access Control Service) You should check the BIOS settings. However, each manufacturer has its own way of setting up the system.

EX: Supermicro https://www.supermicro.com/support/faqs/faq.cfm?faq=22226 https://www.supermicro.com/support/faqs/faq.cfm?faq=31883 https://www.supermicro.com/support/faqs/faq.cfm?faq=20732

[3] Check using gdsio_verify and gdsio commands https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#gds-data-verif-tests https://docs.nvidia.com/gpudirect-storage/configuration-guide/index.html#gdsio

(1) Mount NVMe

~$ sudo mkdir -p /mnt/gds/ext4
~$ sudo mount -o data=ordered /dev/nvme0n1 /mnt/gds/ext4
~$ mount | grep ext4
/dev/sda2 on / type ext4 (rw,relatime)
/dev/nvme0n1 on /mnt/gds/ext4 type ext4 (rw,relatime,data=ordered)

(2) Create test file for gdsio_verify

~$ sudo dd if=/dev/random of=/mnt/gds/ext4/test-fs-1G bs=1024k count=1000

(3) Check gdsio_verify on GPUID0

~$ sudo /usr/local/cuda/gds/tools/gdsio_verify -d 0 -f /mnt/gds/ext4/test-fs-1G -n 1 -s 1G
gpu index :0,file :/mnt/gds/ext4/test-fs-1G, gpu buffer alignment :0, gpu buffer offset :0, gpu devptr offset :0, file offset :0, io_requested :1073741824, io_chunk_size :1073741824, bufregister :true, sync :1, nr ios :1,
fsync :0,
Data Verification Success

(4) Create directory for gdsio

~$ sudo mkdir /mnt/gds/ext4/gds_dir

(5) Write test on CPU only mode

~$ sudo /usr/local/cuda/gds/tools/gdsio -x 1 -D /mnt/gds/ext4/gds_dir -w 8 -s 1G -i 1M -I 1
IoType: WRITE XferType: CPUONLY Threads: 8 DataSetSize: 8381440/8388608(KiB) IOSize: 1024(KiB) Throughput: 2.460552 GiB/sec, Avg_Latency: 3174.274946 usecs ops: 8185 total_time 3.248525 secs

-x (xfer_type) option: 0 - Storage->GPU (GDS) 1 - Storage->CPU 2 - Storage->CPU->GPU 3 - Storage->CPU->GPU_ASYNC 4 - Storage->PAGE_CACHE->CPU->GPU 5 - Storage->GPU_ASYNC

(6) Write test on GDS of GPUID 0

~$ sudo /usr/local/cuda/gds/tools/gdsio -x 0 -d 0 -D /mnt/gds/ext4/gds_dir -w 8 -s 1G -i 1M -I 1
IoType: WRITE XferType: GPUD Threads: 8 DataSetSize: 8381440/8388608(KiB) IOSize: 1024(KiB) Throughput: 2.441271 GiB/sec, Avg_Latency: 3199.571372 usecs ops: 8185 total_time 3.274181 secs

➜ Display error when GDS does not work

(7) Write test on CPU -> GPUID 0

~$ sudo /usr/local/cuda/gds/tools/gdsio -x 2 -d 0 -D /mnt/gds/ext4/gds_dir -w 8 -s 1G -i 1M -I 1
IoType: WRITE XferType: CPU_GPU Threads: 8 DataSetSize: 8381440/8388608(KiB) IOSize: 1024(KiB) Throughput: 2.474772 GiB/sec, Avg_Latency: 3156.123118 usecs ops: 8185 total_time 3.229859 secs

(Ex) Read test on GDS of GPUID 0

$ sudo /usr/local/cuda/gds/tools/gdsio -x 0 -d 0 -D /mnt/gds/ext4/gds_dir -w 8 -s 1G -i 1M -I 0
IoType: READ XferType: GPUD Threads: 8 DataSetSize: 7141376/8388608(KiB) IOSize: 1024(KiB) Throughput: 3.163243 GiB/sec, Avg_Latency: 2517.087344 usecs ops: 6974 total_time 2.153027 secs

Note: read test (-I 0) with verify option (-V) should be used with files written (-I 1) with -V option read test (-I 2) with verify option (-V) should be used with files written (-I 3) with -V option, using same random seed (-k), same number of threads(-w), offset(-o), and data size(-s) write test (-I 1/3) with verify option (-V) will perform writes followed by read

karanveersingh5623 commented 8 months ago

@wakaba-best I am trying to run below command , its failing . I want to reach around 6 ~13GB/s Read for NVMe devices on luster filesystem. Please let me know where to make the changes .

Before Changes

[root@node002 ~]# /usr/local/cuda-11.7/gds/tools/gdscheck.py -p
 GDS release version: 1.3.1.18
 nvidia_fs version:  2.17 libcufile version: 2.12
 Platform: x86_64
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Unsupported
 NVMeOF             : Unsupported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Supported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 BeeGFS             : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Disabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Configured
 --rdma_device_status  : Up: 0 Down: 1
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.force_compat_mode : false
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_size : 128
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 512000
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384
 properties.posix_pool_slab_count : 128 64 32
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 1
 properties.rdma_dynamic_routing_order : GPU_MEM_NVLINKS GPU_MEM SYS_MEM P2P
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.lustre.rdma_dev_addr_list : 192.168.61.92
 fs.beegfs.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 3
 miscellaneous.api_check_aggressive : false
 =========
 GPU INFO:
 =========
 GPU index 0 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
 GPU index 1 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
 GPU index 2 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
 GPU index 3 NVIDIA A100 80GB PCIe bar:1 bar size (MiB):131072 supports GDS, IOMMU State: Disabled
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Platform verification succeeded

After Changes to cufile.json

"profile": {
                            // nvtx profiling on/off
                            "nvtx": false,
                            // cufile stats level(0-3)
                            "cufile_stats": 3,
                            **"io_batchsize": 512**
            },

            "properties": {
                            // max IO chunk size (parameter should be multiples of 64K) used by cuFileRead/Write internally per IO request
                            **"max_direct_io_size_kb" : 524288,**
                            // device memory size (parameter should be 4K aligned) for reserving bounce buffers for the entire GPU
                            **"max_device_cache_size_kb" : 512000,**
                            // limit on maximum device memory size (parameter should be 4K aligned) that can be pinned for a given process
                            "max_device_pinned_mem_size_kb" : 33554432,
                            // true or false (true will enable asynchronous io submission to nvidia-fs driver)
                            // Note : currently the overall IO will still be synchronous
                            "use_poll_mode" : false,
                            // maximum IO request size (parameter should be 4K aligned) within or equal to which library will use polling for IO completion
                            "poll_mode_max_size_kb": 4,
                            // allow compat mode, this will enable use of cuFile posix read/writes
                            "allow_compat_mode": true,
                            // enable GDS write support for RDMA based storage
                            "gds_rdma_write_support": true,
                            // GDS batch size
                            **"io_batchsize": 512,**
                            // enable io priority w.r.t compute streams
                            // valid options are "default", "low", "med", "high"
                            "io_priority": "default",
[root@node002 ~]# /usr/local/cuda-11.7/gds/tools/gdscheck.py -p
 invalid directIO size (KB) specified: 512 min: 1 max: 256
 error reading config properties.io_batchsize
 failed to load config: /etc/cufile.json Invalid argument
 cuFile configuration load error
 cuFile initialization failed
 Platform verification error :
Invalid argument
karanveersingh5623 commented 8 months ago

@wakaba-best i am trying to run below command and getting some errors related to cufile buffer register

[root@node002 ~]# for i in 0 1 2 3; do if [ $i -eq 0 ] || [ $i -eq 1 ]; then /usr/local/cuda-11.7/gds/tools/gdsio -D /mnt/lustre/gds -d $i -n 0 -w 128 -s 1G -i 500M -x 0 -I 3 & else /usr/local/cuda-11.7/gds/tools/gdsio -D /mnt/lustre/gds -d $i -n 1 -w 128 -s 1G -i 500M -x 0 -I 3 & fi; done
[1] 1442728
[2] 1442729
[3] 1442730
[4] 1442731
[root@node002 ~]#
[root@node002 ~]#
[root@node002 ~]#
[root@node002 ~]# cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
cuFile buffer register failed :internal error
karanveersingh5623 commented 8 months ago

how to change the directIO size (KB) to 512M ?

karanveersingh5623 commented 8 months ago

@wakaba-best , please check the cufile.log below lines for the above failing command with 500M io size

11-10-2023 15:49:50:24 [pid=1502621 tid=1502621] NOTICE  cufio-rdma:175 nvidia_peermem.ko is not loaded. Disabling UserSpace RDMA access.
 11-10-2023 15:49:50:31 [pid=1502619 tid=1502619] NOTICE  cufio-rdma:175 nvidia_peermem.ko is not loaded. Disabling UserSpace RDMA access.
 11-10-2023 15:49:50:34 [pid=1502621 tid=1502621] ERROR  cufio-dr:229 No matching pair for network device to closest GPU found in the platform
 11-10-2023 15:49:50:36 [pid=1502618 tid=1502618] NOTICE  cufio-rdma:175 nvidia_peermem.ko is not loaded. Disabling UserSpace RDMA access.
 11-10-2023 15:49:50:38 [pid=1502620 tid=1502620] NOTICE  cufio-rdma:175 nvidia_peermem.ko is not loaded. Disabling UserSpace RDMA access.
 11-10-2023 15:49:50:40 [pid=1502619 tid=1502619] ERROR  cufio-dr:229 No matching pair for network device to closest GPU found in the platform
 11-10-2023 15:49:50:44 [pid=1502618 tid=1502618] ERROR  cufio-dr:229 No matching pair for network device to closest GPU found in the platform
 11-10-2023 15:49:50:46 [pid=1502620 tid=1502620] ERROR  cufio-dr:229 No matching pair for network device to closest GPU found in the platform
 11-10-2023 15:49:51:316 [pid=1502618 tid=1503329] ERROR  0:1072 Inc-bar-usage failed: size 524288000 remaining bytes 281018368
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503329] ERROR  0:410 update bar usage failed
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503329] ERROR  cufio-obj:101 error allocating nvfs handle, size: 524288000
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503329] ERROR  cufio:1185 cuFileBufRegister error, object allocation failed
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503329] ERROR  cufio:1236 cuFileBufRegister error internal error
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503320] ERROR  0:1072 Inc-bar-usage failed: size 524288000 remaining bytes 281018368
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503320] ERROR  0:410 update bar usage failed
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503320] ERROR  cufio-obj:101 error allocating nvfs handle, size: 524288000
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503320] ERROR  cufio:1185 cuFileBufRegister error, object allocation failed
 11-10-2023 15:49:51:317 [pid=1502618 tid=1503320] ERROR  cufio:1236 cuFileBufRegister error internal error
 11-10-2023 15:49:51:318 [pid=1502618 tid=1502854] ERROR  0:1072 Inc-bar-usage failed: size 524288000 remaining bytes 281018368
 11-10-2023 15:49:51:318 [pid=1502618 tid=1502854] ERROR  0:410 update bar usage failed
 11-10-2023 15:49:51:318 [pid=1502618 tid=1502854] ERROR  cufio-obj:101 error allocating nvfs handle, size: 524288000
 11-10-2023 15:49:51:318 [pid=1502618 tid=1502854] ERROR  cufio:1185 cuFileBufRegister error, object allocation failed
 11-10-2023 15:49:51:318 [pid=1502618 tid=1502854] ERROR  cufio:1236 cuFileBufRegister error internal error
 11-10-2023 15:49:51:323 [pid=1502618 tid=1502856] ERROR  0:1072 Inc-bar-usage failed: size 524288000 remaining bytes 281018368
 11-10-2023 15:49:51:323 [pid=1502618 tid=1502856] ERROR  0:410 update bar usage failed
 11-10-2023 15:49:51:323 [pid=1502618 tid=1502856] ERROR  cufio-obj:101 error allocating nvfs handle, size: 524288000
 11-10-2023 15:49:51:323 [pid=1502618 tid=1502856] ERROR  cufio:1185 cuFileBufRegister error, object allocation failed
 11-10-2023 15:49:51:323 [pid=1502618 tid=1502856] ERROR  cufio:1236 cuFileBufRegister error internal error
 11-10-2023 15:49:51:325 [pid=1502618 tid=1503338] ERROR  0:1072 Inc-bar-usage failed: size 524288000 remaining bytes 281018368
 11-10-2023 15:49:51:325 [pid=1502618 tid=1503338] ERROR  0:410 update bar usage failed
 11-10-2023 15:49:51:325 [pid=1502618 tid=1503338] ERROR  cufio-obj:101 error allocating nvfs handle, size: 524288000
karanveersingh5623 commented 8 months ago

@wakaba-best , anything on above if you can share ?

Murphy-AI commented 3 months ago

Can my environment be helpful to you?

$ gdscheck -p
 GDS release version: 1.0.1.3
 nvidia_fs version:  2.7 libcufile version: 2.4
 ============
 ENVIRONMENT:
 ============
 =====================
 DRIVER CONFIGURATION:
 =====================
 NVMe               : Supported
 NVMeOF             : Supported
 SCSI               : Unsupported
 ScaleFlux CSD      : Unsupported
 NVMesh             : Unsupported
 DDN EXAScaler      : Unsupported
 IBM Spectrum Scale : Unsupported
 NFS                : Unsupported
 WekaFS             : Unsupported
 Userspace RDMA     : Unsupported
 --Mellanox PeerDirect : Enabled
 --rdma library        : Not Loaded (libcufile_rdma.so)
 --rdma devices        : Not configured
 --rdma_device_status  : Up: 0 Down: 0
 =====================
 CUFILE CONFIGURATION:
 =====================
 properties.use_compat_mode : true
 properties.gds_rdma_write_support : true
 properties.use_poll_mode : false
 properties.poll_mode_max_size_kb : 4
 properties.max_batch_io_timeout_msecs : 5
 properties.max_direct_io_size_kb : 16384
 properties.max_device_cache_size_kb : 131072
 properties.max_device_pinned_mem_size_kb : 33554432
 properties.posix_pool_slab_size_kb : 4 1024 16384
 properties.posix_pool_slab_count : 128 64 32
 properties.rdma_peer_affinity_policy : RoundRobin
 properties.rdma_dynamic_routing : 0
 fs.generic.posix_unaligned_writes : false
 fs.lustre.posix_gds_min_kb: 0
 fs.weka.rdma_write_support: false
 profile.nvtx : false
 profile.cufile_stats : 0
 miscellaneous.api_check_aggressive : false
 =========
 GPU INFO:
 =========
 GPU index 0 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 1 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 2 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 GPU index 3 Tesla V100-PCIE-32GB bar:1 bar size (MiB):32768 supports GDS
 ==============
 PLATFORM INFO:
 ==============
 IOMMU: disabled
 Platform verification succeeded

[1] OS Version

  • OS : Ubuntu Server 20.04.5 LTS
  • Kernel : 5.4.0-144-generic
  • OFED : MLNX_OFED_LINUX-5.8-2.0.3.0

[2] CUDA & GDS Package(deb files):

$ dpkg -l | grep cuda-tools
ii  cuda-tools-11-4                       11.4.1-1                                amd64        CUDA Tools meta-package
$ dpkg -l | grep gds
ii  gds-tools-11-4                        1.0.1.3-1                               amd64        Tools for GPU Direct Storage
$ dpkg -l | grep nvidia-fs
ii  nvidia-fs                             2.7.50-1                                amd64        NVIDIA filesystem for GPUDirect Storage
ii  nvidia-fs-dkms                        2.7.50-1                                amd64        NVIDIA filesystem DKMS package

[3] check1 : Loaded Kernel modules

$ lsmod | grep nvidia_fs
nvidia_fs             245760  0
ib_core               348160  10 rdma_cm,ib_ipoib,nvme_rdma,iw_cm,nvidia_fs,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
$ lsmod | grep nvme_core
nvme_core             110592  3 nvme,nvme_rdma,nvme_fabrics
mlx_compat             65536  16 rdma_cm,ib_ipoib,mlxdevm,nvme,nvme_rdma,iw_cm,nvme_core,auxiliary,nvme_fabrics,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

[4] check2 : IOMMU is disable

$ dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-144-generic root=UUID=318c92d2-8567-4d37-acba-4050de3146d9 ro intel_iommu=off
[    1.355073] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-144-generic root=UUID=318c92d2-8567-4d37-acba-4050de3146d9 ro intel_iommu=off
[    1.355163] DMAR: IOMMU disabled
[    2.328217] DMAR-IR: IOAPIC id 12 under DRHD base  0xc5ffc000 IOMMU 6
[    2.328219] DMAR-IR: IOAPIC id 11 under DRHD base  0xb87fc000 IOMMU 5
[    2.328221] DMAR-IR: IOAPIC id 10 under DRHD base  0xaaffc000 IOMMU 4
[    2.328223] DMAR-IR: IOAPIC id 18 under DRHD base  0xfbffc000 IOMMU 3
[    2.328225] DMAR-IR: IOAPIC id 17 under DRHD base  0xee7fc000 IOMMU 2
[    2.328227] DMAR-IR: IOAPIC id 16 under DRHD base  0xe0ffc000 IOMMU 1
[    2.328229] DMAR-IR: IOAPIC id 15 under DRHD base  0xd37fc000 IOMMU 0
[    2.328232] DMAR-IR: IOAPIC id 8 under DRHD base  0x9d7fc000 IOMMU 7
[    2.328234] DMAR-IR: IOAPIC id 9 under DRHD base  0x9d7fc000 IOMMU 7
[    3.468127] iommu: Default domain type: Translated

[5] check3 : PCIe Topology => GPU and NVMe devices on the same PXL Switch

$ lspci -tv | grep NVMe -A 3 | grep -v Intel
 +-[0000:3a]-+-00.0-[3b-41]----00.0-[3c-41]--+-04.0-[3d]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
 |           |                               +-08.0-[3e]--
 |           |                               +-0c.0-[3f]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-10.0-[40]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               \-14.0-[41]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
--
 |           |                               +-08.0-[1b]----00.0  Toshiba Corporation NVMe SSD Controller Cx5
 |           |                               +-0c.0-[1c]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               +-10.0-[1d]----00.0  NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
 |           |                               \-14.0-[1e]----00.0  Toshiba Corporation NVMe SSD Controller Cx5

[6] check4 : ACS Control is disable => You get "ACSViol-"

$ sudo lspci -s 1D:00.0 -vvvv | grep -i acs
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

[7] check5: Whether the NVMe drive is formatted as ext4 or xfs, and LVM isn't used?

Recently, I am struggling with supporting nvme, could you please show me your installation process? thanks

Sabiha1225 commented 3 months ago

@Murphy-AI Can you tell me the steps, how are you installing GDS. I am following the documentation but it is not working.