Closed yug0slav closed 2 years ago
same as #95, please note the new requirement:
Please note that to build correctly, a MLNX_OFED carrying the Peer-direct fix for the bug "Peer-direct patch may cause deadlock due to lock inversion" (tracked by the Internal Ref. #2696789) is required, for example MLNX_OFED 5.3-1.0.0.1.43.
I am not following... was the bug fixed in 5.3-1.0.0.1.43? I am on 5.4-1.0.3.0 attempting to build/install nvidia_peer_memory-1.2.
resolved in MLNX_OFED_LINUX-5.4-3.0.3.0
So nv_peer_memory can't be used with ConnectX-3 cards (even though the hardware supports it)?
Note: MLNX_OFED 4.9-x LTS should be used by customers who would like to utilize one of the following:
NVIDIA ConnectX-3 Pro
NVIDIA ConnectX-3
NVIDIA Connect-IB
RDMA experimental verbs library (mlnx_lib)
OSs based on kernel version lower than 3.10
Note: All of the above are not available on MLNX_OFED 5.x branch.
Note: MLNX_OFED 5.4-x LTS should be used by customers who would like to utilize NVIDIA ConnectX-4 onwards adapter cards and keep using stable 5.4-x deployment and get:
Critical bug fixes
Support for new major OSs
OS: CentOS Linux release 7.9.2009 (Core)
Kernel: 3.10.0-1160.42.2.el7.x86_64
Nvidia Driver: NVIDIA-SMI 470.57.02 / Driver Version: 470.57.02 / CUDA Version: 11.4
Mellanox driver: MLNX_OFED_LINUX-5.4-1.0.3.0
Steps:
Building source rpm for nvidia_peer_memory...
Built: /tmp/nvidia_peer_memory-1.2-0.src.rpm
To install run on RPM based OS:
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm Installing /tmp/nvidia_peer_memory-1.2-0.src.rpm Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.kaBNqq
get OFED symbols when building with MLNX_OFED
/bin/cp -f /usr/src/ofa_kernel/default/Module.symvers my.symvers cat nv.symvers >> my.symvers make -C /lib/modules/3.10.0-1160.42.2.el7.x86_64/build M=/root/rpmbuild/BUILD/nvidia_peer_memory-1.2 KBUILD_EXTRA_SYMBOLS="/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/my.symvers" modules make[1]: Entering directory
/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64' INFO: Building with MLNX_OFED from: /usr/src/ofa_kernel/default awk: fatal: cannot open file
nvidia_peer_memory.spec' for reading (No such file or directory) CC [M] /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.o /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:94:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages supportpragma message("Enable nvidia_p2p_dma_map_pages support")
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: variable 'nv_mem_client_ex' has initializer but incomplete type static struct peer_memory_client_ex nv_mem_client_ex = { .client = { ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: unknown field 'client' specified in initializer /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: extra brace group at end of initializer /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: (near initialization for 'nv_mem_client_ex') /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:476:1: warning: excess elements in struct initializer [enabled by default] }}; ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:476:1: warning: (near initialization for 'nv_mem_client_ex') [enabled by default] /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c: In function 'nv_mem_client_init': /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:483:2: error: invalid use of undefined type 'struct peer_memory_client_ex' strcpy(nv_mem_client_ex.client.name, DRV_NAME); ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:488:2: error: invalid use of undefined type 'struct peer_memory_client_ex' strcpy(nv_mem_client_ex.client.version, DRV_VERSION); ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:492:2: error: invalid use of undefined type 'struct peer_memory_client_ex' nv_mem_client_ex.client.version[IB_PEER_MEMORY_VER_MAX-1] = 1; ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:493:2: error: invalid use of undefined type 'struct peer_memory_client_ex' nv_mem_client_ex.ex_size = sizeof(struct peer_memory_client_ex); ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:493:36: error: invalid application of 'sizeof' to incomplete type 'struct peer_memory_client_ex' nv_mem_client_ex.ex_size = sizeof(struct peer_memory_client_ex); ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:2: error: invalid use of undefined type 'struct peer_memory_client_ex' nv_mem_client_ex.flags = PEER_MEM_INVALIDATE_UNMAPS; ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:27: error: 'PEER_MEM_INVALIDATE_UNMAPS' undeclared (first use in this function) nv_mem_client_ex.flags = PEER_MEM_INVALIDATE_UNMAPS; ^ /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:27: note: each undeclared identifier is reported only once for each function it appears in /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:501:2: error: invalid use of undefined type 'struct peer_memory_client_ex' reg_handle = ib_register_peer_memory_client(&nv_mem_client_ex.client, ^ make[2]: [/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.o] Error 1 make[1]: [module/root/rpmbuild/BUILD/nvidia_peer_memory-1.2] Error 2 make[1]: Leaving directory `/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64' make: *** [all] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.3dxLME (%build)
RPM build errors: Bad exit status from /var/tmp/rpm-tmp.3dxLME (%build)