Mellanox / nv_peer_memory

305 stars 61 forks source link

Errors when install nvidia_peer_memory-1.0-8.x86_64.rpm #58

Closed lytofd closed 4 years ago

lytofd commented 4 years ago

Hi: l have met an error when l install nv_peer_mem on a node. nvidia_peer_memory-1.0-8.x86_64.rpm [centos@gpu x86_64]$ sudo rpm -ivh nvidia_peer_memory-1.0-8.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:nvidia_peer_memory-1.0-8 ################################# [100%] depmod: ERROR: fstatat(4, nvidia-uvm.ko.xz): No such file or directory depmod: ERROR: fstatat(4, nvidia.ko.xz): No such file or directory depmod: ERROR: fstatat(4, nvidia-modeset.ko.xz): No such file or directory This is the newest version of nv_peer_mem, my kernel version is 3.10.0-957.27.2.el7.x86_64. After that, l have tried with older version of nv_peer_mem in another node, it successed, its kernel version is 3.10.0-957.12.2.el7.x86_64. All two nodes are installed with cuda 10.1.

LudovicEnault commented 4 years ago

Hi: I guess you have tried to recompile "version 1.0.8", from git sources.

I have an issue as well on 3.10.0-957.el7.x86_64. with cuda 10.1 / drv 418.67

I "reverted" the changes in https://github.com/Mellanox/nv_peer_memory/commit/25774c3f8e1b9306672de48042d9d132d19383d9

# grep modules_pat= create_nv.symvers.sh
modules_pat="__crc_nvidia_p2p_|T nvidia_p2p_"
modules_pat="__crc_nvidia_p2p_"
ferasd commented 4 years ago

Pull request #60 should fix the issue, please give it a try

ferasd commented 4 years ago

fixed by #60 closing