Closed MatthiasDING closed 6 years ago
Adding @alaahl @MatthiasDING can you please add dmesg errors
dmesg errors
nv_peer_mem: disagrees about version of symbol ib_register_peer_memory_client
nv_peer_mem: Unknown symbol ib_register_peer_memory_client (err -22)
@ferasd @alaahl
Hi @MatthiasDING , looks like issue with MLNX_OFED Modules.symvers file;, probably wrong file was used.
Which MLNX_OFED version is installed on the system? Also, please provide the outputs of these 2 commands:
MLNX_OFED version: MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu16.04-x86_64
Input: ls -l /lib/modules
total 8
drwxr-xr-x 6 root root 4096 Jan 7 19:27 4.4.0-104-generic
drwxr-xr-x 6 root root 4096 Dec 27 06:55 4.4.0-21-generic
ls -l /usr/src/ofa_kernel/
drwxr-xr-x 7 root root 4096 Dec 27 14:36 4.4.0-104-generic
drwxr-xr-x 7 root root 4096 Dec 27 06:53 4.4.0-21-generic
lrwxrwxrwx 1 root root 16 Dec 27 06:53 default -> 4.4.0-21-generic
@alaahl
Thanks @MatthiasDING
This confirms what I suspected; it used the Modules.symvers from /usr/src/ofa_kernel/default which points to headers built for 4.4.0-21-generic, but you are compiling against 4.4.0-104-generic.
I will fix the Makefile to use the correct file. But, for now, you can workaround it by changing the "default" link to point to the newer kernel, run (using root): cd /usr/src/ofa_kernel ln -snf 4.4.0-104-generic default
Now, try to install nvidia-peer-memory-dkms again. this time it should use the correct symvers file and the module should load.
solved!!!. Thanks @alaahl
My system setting: System: Ubuntu 16.04 CUDA Version: 9.0 GPU Driver Version: 387.26.
I'm trying to install this module for GPU Direct RDMA. But the error occurs when I install sudo dpkg -i nvidia-peer-memory-dkms_1.0-5_all.deb