NVIDIA / gds-nvidia-fs

NVIDIA GPUDirect Storage Driver
Other
206 stars 31 forks source link

Version checks to determine whether O_DIRECT is required, or not, on the current system. #38

Open kaigai opened 7 months ago

kaigai commented 7 months ago

According to the O_DIRECT Requirements Guide, it says:

O_DIRECT is the only supported mode before CUDA toolkit 12.2 (GDS version 1.7). CUDA 12.2 (GDS version 1.7) introduces support for non O_DIRECT file descriptors as well. The rest of this guide is still relevant for applications depending upon GDS benefits by expressing intent to use O_DIRECT file mode.

It is very helpful improvement, and I want to drop the O_DIRECT flag from my application if it is executed on the system that installs GDS version 1.7 or newer.

On the other hands, our environment based on CUDA 12.2 says CUfileDrvProps.nvfs.major_version = 2 and CUfileDrvProps.nvfs.minor_version = 17. It looks different series of the versioning. (Probably, it is nvidia-fs kmod version because /proc/driver/nvidia-fs/version shows the identical version number.)

Where can I get the version number to determine O_DIRECT necessity?

Best regards,

$ ./ssd2gpu_test /opt/400GB
[sync] GPU0: NVIDIA A100-PCIE-40GB (0000:41:00.0)
file: /opt/400GB, size: 409.06GB, buffer: 32MB x 6
CUfileDrvProps {
nvfs.major_version = 2
nvfs.minor_version = 17
nvfs.poll_thresh_size = 64
nvfs.max_direct_io_size = 16384
nvfs.dstatusflags = 18
nvfs.dcontrolflags = 3
fflags = 1
max_device_cache_size = 131072
per_buffer_cache_size = 0
max_device_pinned_mem_size = 33554432
max_batch_io_size = 128
max_batch_io_timeout_msecs = 5
}
read: 409.06GB, nr_submit: 13091, time: 23.40sec, throughput: 17.49GB/s