Mellanox / nv_peer_memory

305 stars 61 forks source link

ERROR: "nvidia_p2p_*" undefined! #52

Closed pakmarkthub closed 4 years ago

pakmarkthub commented 4 years ago

In Debian, we got the error as shown below when trying to install with sudo dpkg -i nvidia-peer-memory-dkms_1.0-8_all.deb.

DKMS make.log for nvidia-peer-memory-1.0 for kernel 5.2.0-050200-generic (x86_64)
Thu Oct  3 07:30:38 HKT 2019
/var/lib/dkms/nvidia-peer-memory/1.0/build/create_nv.symvers.sh 5.2.0-050200-generic
-W- Could not get list of nvidia symbols.
Found /usr/src/nvidia-<omitted>//nvidia/nv-p2p.h
/bin/cp -f /usr/src/nvidia-<omitted>//nvidia/nv-p2p.h /var/lib/dkms/nvidia-peer-memory/1.0/build/nv-p2p.h
cp -rf /usr/src/ofa_kernel/5.2.0-050200-generic/Module.symvers .
cat nv.symvers >> Module.symvers
make -C /lib/modules/5.2.0-050200-generic/build  M=/var/lib/dkms/nvidia-peer-memory/1.0/build modules
make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
make[1]: Entering directory '/usr/src/linux-headers-5.2.0-050200-generic'
  CC [M]  /var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.o
/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.c:80:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages support
 #pragma message("Enable nvidia_p2p_dma_map_pages support")
         ^~~~~~~
  Building modules, stage 2.
  MODPOST 1 modules
ERROR: "nvidia_p2p_dma_map_pages" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
ERROR: "nvidia_p2p_dma_unmap_pages" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
ERROR: "nvidia_p2p_free_page_table" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
ERROR: "nvidia_p2p_free_dma_mapping" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
ERROR: "nvidia_p2p_get_pages" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
ERROR: "nvidia_p2p_put_pages" [/var/lib/dkms/nvidia-peer-memory/1.0/build/nv_peer_mem.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[2]: *** [__modpost] Error 1
Makefile:1604: recipe for target 'modules' failed
make[1]: *** [modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.2.0-050200-generic'
Makefile:56: recipe for target 'all' failed
make: *** [all] Error 2

In Linux 4., these are shown as WARNING but they have been upgraded to ERROR in Linux 5..

Further investigation shows that ./create_nv.symvers.sh returns -W- Could not get list of nvidia symbols. on Ubuntu. The script fails at line 90 if ! (nm -o $nvidia_mod | grep -q "__crc_nvidia_p2p_"); then because nvidia.ko does not have __crc_nvidia_p2p_.

I believe that this issue occurs when installing NVIDIA driver with dkms support. This issue is not observed on RHEL.

For NVIDIA driver, I tried version 418.39 and newer. I believe that you can use any 418.* to reproduce this bug. The OS I used was Ubuntu 18.04 with Linux 4.15 (got warning) and Linux 5.2 (got error).

JohnSpillerNvidia commented 4 years ago

It appears there are two problems:

  1. The create_nv.symvers.sh script is looking for symbols named __crc_nvidiap2p*, which don't exist in the modules, so it finds no symbols.
  2. With newer kernels, undefined symbols are errors rather than warnings. There may be a third, in that it is looking for nvidia src in /usr/src but not in /var/lib/dkms, but that is another story....
tzafrir-mellanox commented 4 years ago

Updated the pull request. The script will now add both original (crc) and new symbols. It will not attempt a rebuild in the case of the new symbols (I just left that part as-is).

Do you want me to just fix the script itself instead of patch it in the deb packaging?

jwspiller commented 4 years ago

It seems cleaner to me to fix the script, but if patching is the normal approach, I'm ok with it...

On Sun, Nov 3, 2019 at 10:23 PM Tzafrir notifications@github.com wrote:

Updated the pull request. The script will now add both original (crc) and new symbols. It will not attempt a rebuild in the case of the new symbols (I just left that part as-is).

Do you want me to just fix the script itself instead of patch it in the deb packaging?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Mellanox/nv_peer_memory/issues/52?email_source=notifications&email_token=AAO2X3TAYPY4S6VRXWL6TLDQR3NFFA5CNFSM4I44XKNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC5TW4A#issuecomment-549141360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO2X3XAMDDO2HPZEF2NML3QR3NFFANCNFSM4I44XKNA .

ferasd commented 4 years ago

Deb dkms missing symbols #54 was merged