Closed drossetti closed 6 years ago
the reason why 1.1 is shown in the log above is that we are still using the fork at drossetti/nv_peer_memory
@alaahl, could you have a look? I think we need to:
Depends:
line in the debian control file.dkms.conf
's BUILD_DEPENDS
line.Hi @haggaie , You are right, they should be added to Depends tag.
Regarding the kernel upgrade; I faced the same issue in the past.. DKMS seems to build everything in parallel.. Back then I disabled the AUTOINSTALL for the packages that needed ofa-kernel, and used POST_INSTALL in ofa-kernel dkms.conf to run a script that will build and install the dependent modules against the new kernel and ofa-kernel.
I never used BUILD_DEPENDS.. we can try it. However, I see that it's not supported on Ubuntu14.04 (and probably on older versions too).. So not sure that it the way to go...
@alaahl, is there any harm in adding BUILD_DEPENDS just for 16.04? Will it break 14.04 if it is there?
@haggaie , AFAIK they take only supported variables from dkms.conf, so BUILD_DEPENDS will be silently ignored when it's not supported by the DKMS tools (e.g. on 14.04).
any progress on this?
I'll handle this.
Why is there a dependence on cuda? The Nvidia DGX-1 series installs nvidia-peer-memory, but does not have cuda installed, so I fear this will break our build... I think the dependency is on nvidia (the driver) not cuda.
I thought that the cuda (CUDA meta-package) has to be installed always. I will revert https://github.com/Mellanox/nv_peer_memory/commit/2e28f47364d3850e4c59f3f1001f61d2b2d9f79a
@ferasd please review https://github.com/Mellanox/nv_peer_memory/pull/30
Hello,
I've got the same (or looks similary) problem on Centos 7.4:
tar xf nvidia-peer-memory_1.0.5.tar.gz
cd nvidia-peer-memory-1.0
./build_module.sh
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-5.src.rpm
got
// ...
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.0-5.src.rpm
+ cd nvidia_peer_memory-1.0
+ export KVER=3.10.0-693.el7.x86_64
+ KVER=3.10.0-693.el7.x86_64
+ make KVER=3.10.0-693.el7.x86_64 all
/root/rpmbuild/BUILD/nvidia_peer_memory-1.0/create_nv.symvers.sh 3.10.0-693.el7.x86_64
Getting symbol versions from /lib/modules/3.10.0-693.el7.x86_64/extra/nvidia.ko ...
Created: nv.symvers
Found /usr/src/nvidia-387.26/nvidia/nv-p2p.h
/bin/cp -f /usr/src/nvidia-387.26/nvidia/nv-p2p.h /root/rpmbuild/BUILD/nvidia_peer_memory-1.0/nv-p2p.h
cp -rf /Module.symvers .
cp: cannot stat '/Module.symvers': No such file or directory
make: *** [all] Error 1
Trying to add BUILD_DEPENDS="ofa_kernel nvidia"
to the dkms.conf doesn't help.
Also i can't find ofa_kernel
(lsmod | grep ofa_kernel
).
I've installed CentOS with @infiniband
in anaconda kickstart-file. And I suppose that I haven't all required packages.
Hi @Suhoy95 you should install MLNX_OFED, this module does not support the Inbox drivers. from https://github.com/Mellanox/nv_peer_memory/blob/master/README.md : Pre-requisites:
NVIDIA compatible driver is installed and up. MLNX_OFED 2.1 is installed and up.
any solution? thanks
The problem we are seeing on Ubuntu is that after a kernel + MLNX OFED upgrade, DKMS could try to build nv_peer_mem before ofa_kernel, so /var/lib/dkms/mlnx-ofed-kernel/3.4/build/Module.symvers file is not present yet:
$ cat /var/lib/dkms/nvidia-peer-memory/1.1/build/make.log DKMS make.log for nvidia-peer-memory-1.1 for kernel 4.2.0-27-generic (x86_64) Tue Nov 22 10:47:32 PST 2016 cp -rf /Module.symvers . cp: cannot stat ‘/Module.symvers’: No such file or directory