ValeevGroup / tiledarray

A massively-parallel, block-sparse tensor framework written in C++
GNU General Public License v3.0
247 stars 51 forks source link

`ENABLE_CUDA` fails with Cray Wrappers #413

Closed wavefunction91 closed 10 months ago

wavefunction91 commented 11 months ago

When Cray compiler wrappers are used, CMake delegates MPI inclusion to the wrapper and does not propagate the explicit headers / flags to the compilers. This is a problem for nvcc as it does not receive the necessary header includes. Fatal symptom is as follows when compiling CUDA objects:

[ 97%] Building CUDA object src/CMakeFiles/tiledarray.dir/TiledArray/cuda/btas_um_tensor.cpp.o
cd /global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/src && /opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/bin/nvcc -forward-unknown-to-host-compiler -DBTAS_ASSERT_THROWS=1 -DBTAS_DEFAULT_TARGET_MAX_INDEX_RANK=6 -DBTAS_HAS_BLAS_LAPACK=1 -DBTAS_HAS_BOOST_CONTAINER=1 -DBTAS_HAS_BOOST_ITERATOR=1 -DBTAS_HAS_BOOST_SERIALIZATION=1 -DLAPACK_COMPLEX_CPP=1 -DLIBRETT_USES_CUDA=1 -DMADNESS_DISABLE_WORLD_GET_DEFAULT=1 -DMADNESS_MPI_HEADER=\"mpi.h\" -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -D_MPICC_H -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/tiledarray/src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/src -I/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/targets/x86_64-linux/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/madness-src/src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/madness-build/src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/madness-build/src/madness/world -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/madness-src/src/madness/world -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/eigen-src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/btas-src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/blaspp-build/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/blaspp-src/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/lapackpp-build/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/lapackpp-src/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/umpire-src/src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/umpire-src/src/umpire/tpl/camp/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/umpire-build/include -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/build/_deps/librett-src/src -I/global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/slate/include -isystem=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/include --expt-relaxed-constexpr --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -std=c++17 -MD -MT src/CMakeFiles/tiledarray.dir/TiledArray/cuda/btas_um_tensor.cpp.o -MF CMakeFiles/tiledarray.dir/TiledArray/cuda/btas_um_tensor.cpp.o.d -x cu -rdc=true -c /global/cfs/projectdirs/m1027/dbwy/NWChemEx/ta_slate/tiledarray/src/TiledArray/cuda/btas_um_tensor.cpp -o CMakeFiles/tiledarray.dir/TiledArray/cuda/btas_um_tensor.cpp.o
<command-line>: fatal error: mpi.h: No such file or directory
compilation terminated.
make[2]: *** [src/CMakeFiles/tiledarray.dir/build.make:230: src/CMakeFiles/tiledarray.dir/TiledArray/cuda/btas_um_tensor.cpp.o] Error 1

The solution we came up with in GauXC was to insulate CUDA kernels from MPI headers, as this is generally always possible because CUDA API functions / GPU-direct MPI bindings are accessible via the CUDA::cudart TARGET, and can be included purely in C++ code w/o being compiled by nvcc.

Unfortunately, there is no robust way to convince CMake to pass includes / flags in this situation (MPI_ASSUME_NO_BUILTIN_MPI and MPI_SKIP_COMPILER_WRAPPER don't always work as expected)