Mellanox / nv_peer_memory

292 stars 60 forks source link

Weak modules / DKMS support on RHEL #109

Open bodgerer opened 1 year ago

bodgerer commented 1 year ago

Hi,

I was hoping someone could help me, as I'm having difficulty generating the RHEL 8.6 RPM for nv_peer_memory 1.3 in a way that can handle kernel upgrades. There are two main ways to deal with this, and neither of them seem to be working:

1) DKMS

There isn't a spec file that can create a dkms-enabled rpm for RHEL. It's therefore unclear how you'd actually use the dkms support in the nv_peer_memory code base. Can a spec file be created, please?

2) Weak modules

The accepted method of building the software is:

git clone https://github.com/Mellanox/nv_peer_memory.git
cd nv_peer_memory
./build_module.sh
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.3-0.src.rpm

This results in an rpm containing a kernel module built against the current kernel. Ideally, RHEL's weak-modules functionality would link the module to new kernels (if compatible), as they are installed.

However, this does not work if the kernel nv_peer_memory is built against is not installed. If you have an automated install, you'd install the latest version only. weak-modules doesn't pick it up.

For example, I have an nv_peer_memory rpm built against 4.18.0-372.19.1.el8_6.ppc64le. I do a fresh install of a node which installs a later kernel, 4.18.0-372.26.1.el8_6.ppc64le and this rpm, but the nv_peer_memory kernel module does not appear under /lib/modules/4.18.0-372.26.1.el8_6.ppc64le/weak-updates/.

I can prod weak-modules to add it to the running kernel as a one-off by running rpm -q -l nvidia_peer_memory | egrep '^/lib/modules/.*/extra/nv_peer_mem.ko$' | weak-modules --add-modules

Can the spec file be changed such that all this is handled, please - like other kernel module rpms do?