Mellanox / nv_peer_memory

305 stars 61 forks source link

nvidia_peer_memory-1.0-8 modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument #63

Closed yug0slav closed 4 years ago

yug0slav commented 4 years ago

CentOS Linux release 7.7.1908 (Core)

uname -r

3.10.0-1062.9.1.el7.x86_64

lspci |grep mellanox -i

5e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] 5e:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

ofed_info -s

MLNX_OFED_LINUX-4.7-1.0.0.1:

nvidia-smi

Sat Dec 7 00:07:38 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ ...

./build_module.sh

Building source rpm for nvidia_peer_memory...

Built: /tmp/nvidia_peer_memory-1.0-8.src.rpm

To install run on RPM based OS:

rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm

# rpm -ivh <path to generated binary rpm file>

[root@bmlp-c08006:/tmp/nv_peer_memory]# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-8.src.rpm Installing /tmp/nvidia_peer_memory-1.0-8.src.rpm Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.SBGi1I

yum install /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm

Loaded plugins: enabled_repos_upload, fastestmirror, langpacks, nvidia, package_upload, product-id, search-disabled-repos, subscription-manager Examining /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm: nvidia_peer_memory-1.0-8.x86_64 Marking /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-8.x86_64.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package nvidia_peer_memory.x86_64 0:1.0-8 will be installed --> Finished Dependency Resolution ... Dependencies Resolved Package Arch Version Repository Size Installing: nvidia_peer_memory x86_64 1.0-8 /nvidia_peer_memory-1.0-8.x86_64 291 k

Transaction Summary

Install 1 Package

Total size: 291 k Installed size: 291 k Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : nvidia_peer_memory-1.0-8.x86_64 1/1 modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument

/etc/init.d/nv_peer_mem restart

stopping... OK starting... modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument Failed to load nv_peer_mem

dmesg

... [ 2072.534744] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages [ 2072.534750] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22) [ 2072.534767] nv_peer_mem: disagrees about version of symbol nvidia_p2p_get_pages [ 2072.534768] nv_peer_mem: Unknown symbol nvidia_p2p_get_pages (err -22) [ 2072.534779] nv_peer_mem: disagrees about version of symbol nvidia_p2p_put_pages [ 2072.534780] nv_peer_mem: Unknown symbol nvidia_p2p_put_pages (err -22) [ 2072.534801] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages [ 2072.534802] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22) [ 2072.534810] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping [ 2072.534811] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22) [ 2072.534819] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table [ 2072.534820] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)

yug0slav commented 4 years ago

60 might be a fix. Testing...

TimoDritschler commented 4 years ago

Can confirm that #60 fixed this problem for me. CentOS 8 / RHEL8 (4.18.0-80.11.2.el8_0.x86_64), Nvidia Driver 440.33.01

ferasd commented 4 years ago

fixed by #60 closing