Mellanox / nv_peer_memory

305 stars 61 forks source link

create_nv.symvers.sh failed because kernel module name ends with ko.xz instead of .ko #40

Closed davidjkcho closed 6 years ago

davidjkcho commented 6 years ago

"nm -o $nvidia_mod" in create_nv.symvers.sh is looking for .ko but kernel module names on CentOS 7 end with .ko.xz. Thus, it failed to get symbol names.

Below was the change I made to work around.

--- create_nv.symvers.sh.new 2018-05-09 10:38:40.033345119 -0700 +++ create_nv.symvers.sh.old 2018-05-09 10:38:08.114218425 -0700 @@ -77,9 +77,6 @@ if [ ! -e "$nvidia_mod" ]; then continue fi

iAbadia commented 6 years ago

Love this, I personally unpacked the xz in place but great job! Tested on Centos 7.5 (3.10.0-862.3.2.el7.x86_64)

paklui commented 6 years ago

Also seen on CentOS 7.4 and latest Mellanox nvidia-peer-memory_1.0-7.tar.gz: It seems to be the right work around, otherwise would run into the error below:

# /home/pak/nvidia-peer-memory-1.0.7 
# make all
/home/pak/nvidia-peer-memory-1.0.7/create_nv.symvers.sh 3.10.0-693.21.1.el7.x86_64
nm: /lib/modules/3.10.0-693.21.1.el7.x86_64/extra/nvidia.ko.xz: File format not recognized
-W- Could not get list of nvidia symbols.
...

If it's the right fix, can someone please commit the workaround?

alaahl commented 6 years ago

no, this isn't the right WA, it is changing files on the system (decompressing the modules). we'll change the script to accept also ko.xz modules.

alaahl commented 6 years ago

missed the part where you copied the module to local folder, this actually looks good approach.

alaahl commented 6 years ago

fixed by https://github.com/Mellanox/nv_peer_memory/pull/44

jack-inv commented 5 years ago

fixed by #44

Hi, the similar issue happen on Power9, log as:

[user@localhost nvidia-peer-memory-1.0]$ sudo yum localinstall ~/rpmbuild/RPMS/ppc64le/nvidia_peer_memory-1.0-7.ppc64le.rpm Loaded plugins: product-id, search-disabled-repos, subscription-manager This system is registered with an entitlement server, but is not receiving updates. You can use subscription-manager to assign subscriptions. Examining /home/user/rpmbuild/RPMS/ppc64le/nvidia_peer_memory-1.0-7.ppc64le.rpm: nvidia_peer_memory-1.0-7.ppc64le Marking /home/user/rpmbuild/RPMS/ppc64le/nvidia_peer_memory-1.0-7.ppc64le.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package nvidia_peer_memory.ppc64le 0:1.0-7 will be installed --> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================================== Package Arch Version Repository Size

Installing: nvidia_peer_memory ppc64le 1.0-7 /nvidia_peer_memory-1.0-7.ppc64le 310 k

Transaction Summary

Install 1 Package

Total size: 310 k Installed size: 310 k Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : nvidia_peer_memory-1.0-7.ppc64le 1/1 modprobe: ERROR: could not insert 'nv_peer_mem': Unknown symbol in module, or unknown parameter (see dmesg) Verifying : nvidia_peer_memory-1.0-7.ppc64le 1/1

Installed: nvidia_peer_memory.ppc64le 0:1.0-7

Complete! [user@localhost nvidia-peer-memory-1.0]$ ls