Closed NHZlX closed 3 years ago
Run the following command, print nothing.
sudo cat /proc/kallsyms | grep ib_register_peer_memory_client
fix it by reinstall the mellanox dirver
@NHZlX hi, I met the same problem like you, but when I executed the command:
sudo cat /proc/kallsyms | grep ib_register_peer_memory_client
I got the following output:
ffffffffa0de25ac r __kstrtab_ib_register_peer_memory_client [ib_core]
ffffffffa0de25cb r __kstrtabns_ib_register_peer_memory_client [ib_core]
ffffffffa0ddc54c r __ksymtab_ib_register_peer_memory_client [ib_core]
ffffffffa0ddb668 t ib_register_peer_memory_client.cold [ib_core]
ffffffffa0dd8c60 T ib_register_peer_memory_client [ib_core]
So I think I don't need to reinstall the Mellanox driver, could you give me some help? More infomation:
5.12.0-xrp-vhost-blk+
Wed May 17 06:33:53 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... Off | 00000000:31:00.0 Off | 0 |
| N/A 44C P0 68W / 300W | 1459MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A800 80G... Off | 00000000:4B:00.0 Off | 0 |
| N/A 49C P0 75W / 300W | 971MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3747496 C python 1457MiB | | 1 N/A N/A 3747496 C python 969MiB | +-----------------------------------------------------------------------------+
OFED-internal-5.8-1.1.2:
total 44 drwxr-xr-x 4 root root 4096 May 5 03:44 5.12.0-xrp+ drwxr-xr-x 4 root root 4096 May 5 03:44 5.12.0-xrp-vhost-blk+ drwxr-xr-x 3 root root 4096 Feb 10 06:44 5.4.0-132-generic drwxr-xr-x 2 root root 4096 Jan 7 06:17 5.4.0-135-generic drwxr-xr-x 2 root root 4096 Jan 13 06:14 5.4.0-136-generic drwxr-xr-x 5 root root 4096 Jan 13 06:13 5.4.0-137-generic drwxr-xr-x 5 root root 4096 Feb 10 06:43 5.4.0-139-generic drwxr-xr-x 3 root root 4096 May 5 03:43 5.4.0-146-generic drwxr-xr-x 2 root root 4096 May 5 02:24 5.4.0-148-generic drwxr-xr-x 3 root root 4096 May 5 03:43 5.4.0-rc8+ drwxr-xr-x 4 root root 4096 May 5 02:27 6.1.0-KVM_EXIT_EFAULT_from_lwn_for_hyperdisk_dev-ga530af7b1987
total 4 lrwxrwxrwx 1 root root 36 Mar 22 10:39 default -> /etc/alternatives/ofa_kernel_headers drwxr-xr-x 8 root root 4096 May 5 03:39 x86_64
[610939.210200] nvidia_peermem: disagrees about version of symbol ib_register_peer_memory_client [610939.210206] nvidia_peermem: Unknown symbol ib_register_peer_memory_client (err -22) [610985.271522] nvidia_peermem: disagrees about version of symbol ib_register_peer_memory_client [610985.271530] nvidia_peermem: Unknown symbol ib_register_peer_memory_client (err -22) [612950.903683] nvidia_peermem: disagrees about version of symbol ib_register_peer_memory_client [612950.903694] nvidia_peermem: Unknown symbol ib_register_peer_memory_client (err -22)
I meet same problem with you, do you have any way to fix it?
i meet same problem . with OFED driver - version 5.8.0
Hi, i met the same problem as https://github.com/Mellanox/nv_peer_memory/issues/28
More Information:
Help welcome!