Mellanox / nv_peer_memory

309 stars 62 forks source link

modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument #116

Open nnurlan008 opened 11 months ago

nnurlan008 commented 11 months ago

Hello,

I am trying to install nv_peer_memory module to my machine with the following specifications:

OS: Ubuntu 22.04.1 GPU: Nvidia Tesla K40c Nvidia Driver Version: 470.199.02: MLNX Driver Version: MLNX_OFED_LINUX-23.07-0.5.1.2 RNIC: Mellanox Connectx-4

I get the following error when I run sudo dpkg -i nvidia-peer-memory-dkms_1.2-0_all.deb: depmod... modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument dpkg: error processing package nvidia-peer-memory-dkms (--install): installed nvidia-peer-memory-dkms package post-installation script subprocess returned error exit status 1 Errors were encountered while processing: nvidia-peer-memory-dkms

output of ls -l /lib/modules: total 12 drwxr-xr-x 2 root root 4096 Nov 1 21:30 5.17.0-1035-oem drwxr-xr-x 5 root root 4096 Nov 1 21:30 6.2.0-26-generic drwxr-xr-x 6 root root 4096 Nov 3 21:48 6.2.0-36-generic

output of ls -l /usr/src/ofa_kernel/: total 4 lrwxrwxrwx 1 root root 16 Nov 3 21:48 default -> 6.2.0-36-generic drwxr-xr-x 3 root root 4096 Nov 3 17:38 x86_64

Can you please help me solve this issue?

Thanks and regards

nelsonsilva94 commented 6 months ago

Hi,

Does anyone have any suggestion for this? I am facing the same problem

nnurlan008 commented 6 months ago

I solved this issue by installing ubuntu 20.04 and nividia driver 470.

javo9205 commented 3 months ago

Hi, I am running into the same issue. My machine has the following specifications

Property Value
OS Ubuntu 22.04.2
Kernel 6.5.0-41-generic
GPU NVIDIA GeForce GTX 1660
Driver NVIDIA UNIX Open Kernel Module for x86_64 555.42.02
MLNX MLNX_OFED_LINUX-23.10-2.1.3.1

I get the same modprobe error. dmesg spits out:

[ 4973.941875] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[ 4973.941879] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[ 4973.941895] nv_peer_mem: disagrees about version of symbol nvidia_p2p_get_pages
[ 4973.941896] nv_peer_mem: Unknown symbol nvidia_p2p_get_pages (err -22)
[ 4973.941905] nv_peer_mem: disagrees about version of symbol nvidia_p2p_put_pages
[ 4973.941907] nv_peer_mem: Unknown symbol nvidia_p2p_put_pages (err -22)
[ 4973.941930] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages
[ 4973.941931] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[ 4973.941940] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[ 4973.941941] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)
[ 4973.941949] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table
[ 4973.941950] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)

Any assistance would be appreciated!

nnurlan008 commented 3 months ago

Hi,

There is a module called nvidia-peermem which is the same module as nv_peer_mem and provided in the proprietary drivers with version >= 470. Use sudo modprobe nvidia-peermem to manually the load the module.

But if you specifically want to use nv_peer_mem, I think you will need to downgrade nvidia driver to nividia driver 470 and ubuntu 20, which worked in my case.

Hope this is helpful.