NVIDIA / AMGX

Distributed multigrid linear solver library on GPU
482 stars 139 forks source link

[Build] v2.4.0 with Cuda 11.0 #276

Closed pledac closed 2 months ago

pledac commented 11 months ago

Describe the issue

On some build (can't say for the moment why some works, other not):

src/tests/dense_lu.cu(114): error: identifier "cudaMallocAsync" is undefined

Environment information:

Configuration information

Provide your cmake command line that was used for configuration and it's full output:

-- The C compiler identification is GNU 8.3.0 -- The CXX compiler identification is GNU 8.3.0 -- The CUDA compiler identification is NVIDIA 11.0.221 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /ccc/products/gcc-8.3.0/system/default/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /ccc/products/gcc-8.3.0/system/default/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/exec/ccache/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found MPI_C: /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so (found version "3.1") -- Found MPI_CXX: /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi_cxx.so (found version "3.1") -- Found MPI: TRUE (found version "3.1")
-- Found CUDAToolkit: /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/include (found suitable version "11.0.221", minimum required is "10.0") -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5") -- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES) -- Could NOT find OpenMP (missing: OpenMP_CXX_FOUND) (found version "4.5") This is a MPI build:TRUE -- Found libcudacxx: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/libcudacxx/lib/cmake/libcudacxx/libcudacxx-config.cmake (found suitable version "1.8.1.0", minimum required is "1.8.0") -- Found Thrust: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/thrust/cmake/thrust-config.cmake (found version "2.1.0.0") -- Found CUB: /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/cub/cub/cmake/cub-config.cmake (found suitable version "2.1.0.0", minimum required is "2.1.0.0") -- Configuring done -- Generating done

Compilation information Issue information

VERBOSE=1 make make[2]: Entering directory '/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/build' [ 0%] Building CUDA object CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/exec/ccache/nvcc -forward-unknown-to-host-compiler -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/external/rapidjson/include -I/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/include -I/ccc/products/openmpi-4.1.4/gcc--8.3.0/default/include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/../include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/thrust/cmake/../.. -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/libcudacxx/include -I/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/thrust/dependencies/cub/cub/cmake/../.. -L/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] -I/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/math_libs/include --compiler-options -L/ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/math_libs/lib64 -fPIC -DNDEBUG --extended-lambda --Werror cross-execution-space-call -DNVTX_RANGES -DDISABLE_MIXED_PRECISION -DCUSPARSE_GENERIC_INTERFACES -DCUSPARSE_USE_GENERIC_SPGEMM -Xcompiler "-fno-openmp -Wno-terminate -DRAPIDJSON_DEFINED -DAMGX_WITH_MPI -rdynamic -fPIC -fvisibility=default" -DTHRUST_CUB_WRAPPED_NAMESPACE=amgx -std=c++14 -MD -MT CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o -MF CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o.d -x cu -c /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu -o CMakeFiles/amgx_libs.dir/src/tests/dense_lu.cu.o /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu(114): error: identifier "cudaMallocAsync" is undefined

1 error detected in the compilation of "/ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/tests/dense_lu.cu".

Additional context

v2.3.0 build is OK. Another build of v2.4.0 with GCC 11.0.1, Cuda 11.8. is OK.

Replacing cudaMallocAsync by amgx::memory::cudaMallocAsync seems to fix. It is the correct fix ?

pledac commented 11 months ago

Then, errors during example builds:

VERBOSE=1 make cd /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/build/examples && /ccc/products2/cmake-3.22.2/Rhel_8x86_64/system/default/bin/cmake -E cmake_link_script CMakeFiles/amgx_mpi_capi_agg.dir/link.txt --verbose=1 /ccc/products/gcc-8.3.0/system/default/bin/gcc -DRAPIDJSON_DEFINED -DAMGX_WITH_MPI -O3 -DNDEBUG -L/ccc/products2/openmpi-4.1.4.6/Rhel_8x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8x86_64/system/cuda-11.6/lib -L/ccc/products2/openmpi-4.1.4.6/Rhel_8x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8__x86_64/system/cuda-11.6/lib CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o -o amgx_mpi_capi_agg /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so ../libamgxsh.so -lrt -ldl /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64/libcudart_static.a /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcublas.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcusolver.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcublas.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/../../math_libs/11.0/lib64/libcusparse.so /ccc/products/nvhpc-22.7/system/default/Linux_x86_64/22.7/cuda/11.0/lib64/libnvToolsExt.so -lm -lpthread /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi_cxx.so /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so -ldl -lpthread /usr/lib64/librt.so CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaMallocAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync' CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o:amgx_mpi_capi_agg.c:function main: error: undefined reference to 'cudaFreeAsync'

cudaMallocAsync & cudaFreeAsync appears with Cuda 11.2. So probably, that's the reason there. AmgX v2.4.0 don't build with Cuda<11.2

mattmartineau commented 10 months ago

Thanks for reporting this. I changed our internal testing to include CUDA 11.0 (we previously had just 11.2, 11.8). I'll push a fix to main shortly.

mattmartineau commented 9 months ago

Hopefully should be fixed in main.

pledac commented 9 months ago

Thanks Matt.