Mellanox / nv_peer_memory

305 stars 61 forks source link

Unknown symbol error in `dpkg -i /tmp/nvidia-peer-memory-dkms_1.0-8_all.deb` #57

Closed czkkkkkk closed 4 years ago

czkkkkkk commented 4 years ago

Environment

System: Ubuntu 16.04 CUDA version: 10.0 Mellanox ofed: 4.6-1.0.1

$ uname  -r
4.4.0-131-generic
$ ls -l /lib/modules
total 16
drwxr-xr-x 7 root root 4096 Nov 26 20:02 4.4.0-131-generic
drwxr-xr-x 3 root root 4096 Jul 31 22:32 4.4.0-21-generic
drwxr-xr-x 3 root root 4096 Jul 31 22:32 4.4.0-64-generic
drwxr-xr-x 3 root root 4096 Jul 31 22:33 4.4.0-66-generic
$ ls -l /usr/src/ofa_kernel/
total 4
drwxr-xr-x 7 root root 4096 Aug  2 02:46 4.4.0-131-generic
lrwxrwxrwx 1 root root   17 Aug  2 02:46 default -> 4.4.0-131-generic

Description

Hi. I tried to install nv_peer_memory. I ran the following commands:

./build_module.sh
cd /tmp
tar xzf /tmp/nvidia-peer-memory_1.0.orig.tar.gz
cd nvidia-peer-memory-1.0
dpkg-buildpackage -us -uc
dpkg -i /tmp/nvidia-peer-memory_1.0-8_all.deb
dpkg -i /tmp/nvidia-peer-memory-dkms_1.0-8_all.deb

It failed when tried to install dkms deb. The full build log is:

$ dpkg -i /tmp/nvidia-peer-memory-dkms_1.0-8_all.deb
(Reading database ... 133469 files and directories currently installed.)
Preparing to unpack .../nvidia-peer-memory-dkms_1.0-8_all.deb ...

------------------------------
Deleting module version: 1.0
completely from the DKMS tree.
------------------------------
Done.
Unpacking nvidia-peer-memory-dkms (1.0-8) over (1.0-8) ...
Setting up nvidia-peer-memory-dkms (1.0-8) ...
Loading new nvidia-peer-memory-1.0 DKMS files...
Building only for 4.4.0-131-generic
Building initial module for 4.4.0-131-generic
Secure Boot not enabled on this system.
Done.

nv_peer_mem:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.4.0-131-generic/updates/dkms/

depmod....

DKMS: install completed.
modprobe: ERROR: could not insert 'nv_peer_mem': Invalid argument
dpkg: error processing package nvidia-peer-memory-dkms (--install):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 nvidia-peer-memory-dkms

The dmesg errors are:

$ dmesg | grep nv_peer_mem
[1624474.366292] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[1624474.366314] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[1624474.366316] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)
[1624474.366338] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table
[1624474.366340] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)
[1633847.270244] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_unmap_pages
[1633847.270249] nv_peer_mem: Unknown symbol nvidia_p2p_dma_unmap_pages (err -22)
[1633847.270275] nv_peer_mem: disagrees about version of symbol nvidia_p2p_get_pages
[1633847.270277] nv_peer_mem: Unknown symbol nvidia_p2p_get_pages (err -22)
[1633847.270296] nv_peer_mem: disagrees about version of symbol nvidia_p2p_put_pages
[1633847.270298] nv_peer_mem: Unknown symbol nvidia_p2p_put_pages (err -22)
[1633847.270347] nv_peer_mem: disagrees about version of symbol nvidia_p2p_dma_map_pages
[1633847.270349] nv_peer_mem: Unknown symbol nvidia_p2p_dma_map_pages (err -22)
[1633847.270367] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_dma_mapping
[1633847.270369] nv_peer_mem: Unknown symbol nvidia_p2p_free_dma_mapping (err -22)
[1633847.270386] nv_peer_mem: disagrees about version of symbol nvidia_p2p_free_page_table
[1633847.270388] nv_peer_mem: Unknown symbol nvidia_p2p_free_page_table (err -22)

I checked the following similar issues but found they are not the source of the problem.

e-ago commented 4 years ago

Something similar in https://github.com/Mellanox/nv_peer_memory/issues/55

czkkkkkk commented 4 years ago

I solved this problem by changing to the release branch.