NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
MIT License
898 stars 144 forks source link

GDRCopy 2.4 on Centos7 failing build of RPM packages #287

Closed gmarciani closed 12 months ago

gmarciani commented 12 months ago

Hello team,

The build of RPM packages for GDRCopy 2.4 is failing on Centos7 when compiling testsuites/testsuite.cpp. The failure does not occur with GDrCopy 2.3.1. You can find details about our environment and the full error log down below.

How can we solve this issue? In case it is due to a bug in GDRCopy 2.4, what is the estimated time for the next release with the fix? If not planned yet, is there any workaround we could do to use GDrCopy 2.4 on Centos7?

Thanks

Environment

Full Log

[root@ip-3-5-1-183 packages]# CUDA=/usr/local/cuda ./build-rpm-packages.sh
Building rpm package ...
+ cd /root/gdrcopy-2.4/packages
Building gdrcopy rpm packages version 2.4 ...
Working in /tmp/gdr.3rLvJs ...
+ cd /root/gdrcopy-2.4/packages/..
+ mkdir -p /tmp/gdr.3rLvJs/gdrcopy
+ rm -rf /tmp/gdr.3rLvJs/gdrcopy/*
+ cp -r packages/dkms.conf packages/rhel/init.d packages/rhel/gdrcopy.service insmod.sh Makefile README.md include src tests config_arch LICENSE packages/gdrcopy.spec /tmp/gdr.3rLvJs/gdrcopy/
+ rm -f /tmp/gdr.3rLvJs/gdrcopy-2.4.tar.gz
+ cd /tmp/gdr.3rLvJs/gdrcopy
+ find . -type f -exec sed -i s/@FULL_VERSION@/2.4/g {} +
+ find . -type f -exec sed -i s/@VERSION@/2.4/g {} +
+ find . -type f -exec sed -i s/@MODULE_LOCATION@/\/kernel\/drivers\/misc\//g {} +
+ cd /tmp/gdr.3rLvJs
+ mv gdrcopy gdrcopy-2.4
+ tar czvf gdrcopy-2.4.tar.gz gdrcopy-2.4
gdrcopy-2.4/
gdrcopy-2.4/dkms.conf
gdrcopy-2.4/init.d/
gdrcopy-2.4/init.d/gdrcopy
gdrcopy-2.4/gdrcopy.service
gdrcopy-2.4/insmod.sh
gdrcopy-2.4/Makefile
gdrcopy-2.4/README.md
gdrcopy-2.4/include/
gdrcopy-2.4/include/gdrapi.h
gdrcopy-2.4/include/gdrconfig.h
gdrcopy-2.4/src/
gdrcopy-2.4/src/Makefile
gdrcopy-2.4/src/gdrapi.c
gdrcopy-2.4/src/gdrapi_internal.h
gdrcopy-2.4/src/gdrdrv/
gdrcopy-2.4/src/gdrdrv/Makefile
gdrcopy-2.4/src/gdrdrv/gdrdrv.c
gdrcopy-2.4/src/gdrdrv/gdrdrv.h
gdrcopy-2.4/src/gdrdrv/nv-p2p-dummy.c
gdrcopy-2.4/src/memcpy_avx.c
gdrcopy-2.4/src/memcpy_sse.c
gdrcopy-2.4/src/memcpy_sse41.c
gdrcopy-2.4/tests/
gdrcopy-2.4/tests/Makefile
gdrcopy-2.4/tests/apiperf.cpp
gdrcopy-2.4/tests/common.cpp
gdrcopy-2.4/tests/common.hpp
gdrcopy-2.4/tests/copybw.cpp
gdrcopy-2.4/tests/copylat.cpp
gdrcopy-2.4/tests/pplat.cu
gdrcopy-2.4/tests/sanity.cpp
gdrcopy-2.4/tests/testsuites/
gdrcopy-2.4/tests/testsuites/testsuite.cpp
gdrcopy-2.4/tests/testsuites/testsuite.hpp
gdrcopy-2.4/config_arch
gdrcopy-2.4/LICENSE
gdrcopy-2.4/gdrcopy.spec
+ mkdir -p /tmp/gdr.3rLvJs/topdir/SRPMS /tmp/gdr.3rLvJs/topdir/RPMS /tmp/gdr.3rLvJs/topdir/SPECS /tmp/gdr.3rLvJs/topdir/BUILD /tmp/gdr.3rLvJs/topdir/SOURCES
+ cp gdrcopy-2.4/gdrcopy.spec /tmp/gdr.3rLvJs/topdir/SPECS/
+ cp gdrcopy-2.4.tar.gz /tmp/gdr.3rLvJs/topdir/SOURCES/
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.TzFiNc
+ umask 022
+ cd /tmp/gdr.3rLvJs/topdir/BUILD
+ cd /tmp/gdr.3rLvJs/topdir/BUILD
+ rm -rf gdrcopy-2.4
+ /usr/bin/gzip -dc /tmp/gdr.3rLvJs/topdir/SOURCES/gdrcopy-2.4.tar.gz
+ /usr/bin/tar -xvvf -
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/
-rw-r--r-- root/root       140 2023-11-27 11:08 gdrcopy-2.4/dkms.conf
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/init.d/
-rwxr-xr-x root/root      3001 2023-11-27 11:08 gdrcopy-2.4/init.d/gdrcopy
-rw-r--r-- root/root       308 2023-11-27 11:08 gdrcopy-2.4/gdrcopy.service
-rwxr-xr-x root/root      1630 2023-11-27 11:08 gdrcopy-2.4/insmod.sh
-rw-r--r-- root/root      2863 2023-11-27 11:08 gdrcopy-2.4/Makefile
-rw-r--r-- root/root     19541 2023-11-27 11:08 gdrcopy-2.4/README.md
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/include/
-rw-r--r-- root/root      5642 2023-11-27 11:08 gdrcopy-2.4/include/gdrapi.h
-rw-r--r-- root/root       332 2023-11-27 11:08 gdrcopy-2.4/include/gdrconfig.h
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/src/
-rw-r--r-- root/root      2625 2023-11-27 11:08 gdrcopy-2.4/src/Makefile
-rw-r--r-- root/root     29497 2023-11-27 11:08 gdrcopy-2.4/src/gdrapi.c
-rw-r--r-- root/root      2116 2023-11-27 11:08 gdrcopy-2.4/src/gdrapi_internal.h
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/src/gdrdrv/
-rw-r--r-- root/root      2892 2023-11-27 11:08 gdrcopy-2.4/src/gdrdrv/Makefile
-rw-r--r-- root/root     43994 2023-11-27 11:08 gdrcopy-2.4/src/gdrdrv/gdrdrv.c
-rw-r--r-- root/root      3679 2023-11-27 11:08 gdrcopy-2.4/src/gdrdrv/gdrdrv.h
-rw-r--r-- root/root      3540 2023-11-27 11:08 gdrcopy-2.4/src/gdrdrv/nv-p2p-dummy.c
-rw-r--r-- root/root      7619 2023-11-27 11:08 gdrcopy-2.4/src/memcpy_avx.c
-rw-r--r-- root/root      6898 2023-11-27 11:08 gdrcopy-2.4/src/memcpy_sse.c
-rw-r--r-- root/root      5726 2023-11-27 11:08 gdrcopy-2.4/src/memcpy_sse41.c
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/tests/
-rw-r--r-- root/root      2243 2023-11-27 11:08 gdrcopy-2.4/tests/Makefile
-rw-r--r-- root/root     10171 2023-11-27 11:08 gdrcopy-2.4/tests/apiperf.cpp
-rw-r--r-- root/root     12412 2023-11-27 11:08 gdrcopy-2.4/tests/common.cpp
-rw-r--r-- root/root      5801 2023-11-27 11:08 gdrcopy-2.4/tests/common.hpp
-rw-r--r-- root/root      9782 2023-11-27 11:08 gdrcopy-2.4/tests/copybw.cpp
-rw-r--r-- root/root     11230 2023-11-27 11:08 gdrcopy-2.4/tests/copylat.cpp
-rw-r--r-- root/root     10419 2023-11-27 11:08 gdrcopy-2.4/tests/pplat.cu
-rw-r--r-- root/root     60254 2023-11-27 11:08 gdrcopy-2.4/tests/sanity.cpp
drwxr-xr-x root/root         0 2023-11-27 11:08 gdrcopy-2.4/tests/testsuites/
-rw-r--r-- root/root      8652 2023-11-27 11:08 gdrcopy-2.4/tests/testsuites/testsuite.cpp
-rw-r--r-- root/root      3209 2023-11-27 11:08 gdrcopy-2.4/tests/testsuites/testsuite.hpp
-rwxr-xr-x root/root      1613 2023-11-27 11:08 gdrcopy-2.4/config_arch
-rw-r--r-- root/root      1092 2023-11-27 11:08 gdrcopy-2.4/LICENSE
-rw-r--r-- root/root     17024 2023-11-27 11:08 gdrcopy-2.4/gdrcopy.spec
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd gdrcopy-2.4
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.FEpDZd
+ umask 022
+ cd /tmp/gdr.3rLvJs/topdir/BUILD
+ cd gdrcopy-2.4
+ echo building
building
+ make -j8 CUDA=/usr/local/cuda config lib exes
GDRAPI_ARCH=X86
cd src && \
make LIB_MAJOR_VER=2 LIB_MINOR_VER=4
make[1]: Entering directory `/tmp/gdr.3rLvJs/topdir/BUILD/gdrcopy-2.4/src'
cc -O2 -fPIC -I ../include -I gdrdrv/ -D GDRAPI_ARCH=X86  -c -o gdrapi.o gdrapi.c
cc -O2 -fPIC -I ../include -I gdrdrv/ -D GDRAPI_ARCH=X86  -c -mavx -o memcpy_avx.o memcpy_avx.c
cc -O2 -fPIC -I ../include -I gdrdrv/ -D GDRAPI_ARCH=X86  -c -msse -o memcpy_sse.o memcpy_sse.c
cc -O2 -fPIC -I ../include -I gdrdrv/ -D GDRAPI_ARCH=X86  -c -msse4.1 -o memcpy_sse41.o memcpy_sse41.c
GDRAPI_ARCH=X86
cc -shared -Wl,-soname,libgdrapi.so.2 -o libgdrapi.so.2.4 gdrapi.o memcpy_avx.o memcpy_sse.o memcpy_sse41.o
PATH=/sbin:/usr/sbin:$PATH; ldconfig -n /tmp/gdr.3rLvJs/topdir/BUILD/gdrcopy-2.4/src
ln -sf libgdrapi.so.2.4 libgdrapi.so.2
ln -sf libgdrapi.so.2 libgdrapi.so
make[1]: Leaving directory `/tmp/gdr.3rLvJs/topdir/BUILD/gdrcopy-2.4/src'
cd tests && \
make CUDA=/usr/local/cuda
make[1]: Entering directory `/tmp/gdr.3rLvJs/topdir/BUILD/gdrcopy-2.4/tests'
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o copybw.o copybw.cpp
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o common.o common.cpp
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o sanity.o sanity.cpp
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o testsuites/testsuite.o testsuites/testsuite.cpp
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o copylat.o copylat.cpp
g++ -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o apiperf.o apiperf.cpp
/usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include 
testsuites/testsuite.cpp: In function 'void gdrcopy::testsuite::get_all_test_names(std::vector<std::basic_string<char> >&)':
testsuites/testsuite.cpp:47:23: error: 'class std::vector<std::basic_string<char> >' has no member named 'emplace_back'
                 names.emplace_back(it->first);
                       ^
testsuites/testsuite.cpp: In function 'int gdrcopy::testsuite::run_tests(bool, std::vector<std::basic_string<char> >)':
testsuites/testsuite.cpp:60:89: error: cannot pass objects of non-trivially-copyable type 'class std::basic_string<char>' through '...'
                     gdrcopy::test::print_dbg("Error: Encountered unknown test %s\n", *it);
                                                                                         ^
testsuites/testsuite.cpp:61:28: error: 'EINVAL' was not declared in this scope
                     return EINVAL;
                            ^
testsuites/testsuite.cpp:79:32: error: 'EINVAL' was not declared in this scope
                         return EINVAL;
                                ^
testsuites/testsuite.cpp: In member function 'virtual gdrcopy::testsuite::test_status_t gdrcopy::testsuite::Test::run()':
testsuites/testsuite.cpp:145:42: error: 'EINVAL' was not declared in this scope
                 int child_exit_status = -EINVAL;
                                          ^
make[1]: *** [testsuites/testsuite.o] Error 1
make[1]: *** Waiting for unfinished jobs....
sanity.cpp: In function 'int main(int, char**)':
sanity.cpp:2041:23: error: 'class std::vector<std::basic_string<char> >' has no member named 'emplace_back'
                 tests.emplace_back(optarg);
                       ^
make[1]: *** [sanity.o] Error 1
make[1]: Leaving directory `/tmp/gdr.3rLvJs/topdir/BUILD/gdrcopy-2.4/tests'
make: *** [exes] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.FEpDZd (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.FEpDZd (%build)
ls: cannot access /tmp/gdr.3rLvJs/topdir/RPMS/*/*.rpm: No such file or directory

+ cd /root/gdrcopy-2.4/packages
ls: cannot access /tmp/gdr.3rLvJs/topdir/SRPMS/*.rpm: No such file or directory
ls: cannot access /tmp/gdr.3rLvJs/topdir/RPMS/*/*.rpm: No such file or directory

Cleaning up ...
+ rm -rf /tmp/gdr.3rLvJs
gmarciani commented 12 months ago

According to [this](), such error can be solved by using a C++ compiler compatible with C++ v11.

When building with CXXFLAGS="-g -std=c++11 -Wall -pedantic" the previous error seems fixed, but another one is blocking the build:

make[1]: Entering directory `/tmp/gdr.r70JzJ/topdir/BUILD/gdrcopy-2.4/tests'
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o copybw.o copybw.cpp
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o common.o common.cpp
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o sanity.o sanity.cpp
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o testsuites/testsuite.o testsuites/testsuite.cpp
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o copylat.o copylat.cpp
g++ -g -std=c++11 -O2 -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -c -o apiperf.o apiperf.cpp
/usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include 
testsuites/testsuite.cpp: In function 'int gdrcopy::testsuite::run_tests(bool, std::vector<std::basic_string<char> >)':
testsuites/testsuite.cpp:60:89: error: cannot pass objects of non-trivially-copyable type 'class std::basic_string<char>' through '...'
                     gdrcopy::test::print_dbg("Error: Encountered unknown test %s\n", *it);
                                                                                         ^
make[1]: *** [testsuites/testsuite.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/tmp/gdr.r70JzJ/topdir/BUILD/gdrcopy-2.4/tests'
make: *** [exes] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.afHJt3 (%build)
pakmarkthub commented 12 months ago

Hi @gmarciani,

Since v2.4, CentOS 7 has been removed from the support matrix. We also don't QA on that OS anymore. Because the important features/bug fixes we added in v2.4 are not related to CentOS 7, you can continue to use v2.3 without loosing out too much.