Open VinInn opened 5 years ago
Outside of CMSSW, you can just replace
CUDA_PATH=/usr/local/cuda-9.2
$CUDA_PATH/bin/nvcc -std=c++14 -O2 -g --generate-code arch=compute_70,code=sm_70 --expt-relaxed-constexpr test.cu -o test
with
CUDA_PATH=/usr/local/cuda-9.2
CLANG_CUDA_FLAGS=-x cuda --cuda-path=$CUDA_PATH -I$CUDA_PATH/include -L$CUDA_PATH/lib64/stubs -L$CUDA_PATH/lib64 -l cudart
clang++-7 $CLANG_CUDA_FLAGS -std=c++14 -O2 -g --cuda-gpu-arch=sm_70 test.cu -o test
From a quick test, clang 7.0.1 doesn't seem to work with CUDA 10, but clang 8 (nightly) does.
Within CMSSW, I expect this to be a bit more complicated because of
The example above builds the CUDA part of the code as a single translation unit; what we use in CMSSW is to build separate .cu files independently, and then link them together, before linking with the host code. This seems to be supported starting from LLVM/clang 7 (and is the reason I pushed to have it in CMSSW), but the commands are a bit more complex.
Currently, the binary and flags used by SCRAM to build .cu files are mostly hard-coded in the SCRAM Makefile rules - and is already rather messy.
Adding support for building with clang++ -x cuda
vs nvcc
will not help cleaning it up...
I have never tried building the host code with g++, the device code with clang, and linking them - so I have no idea if it is supported, unsupported but "just works", or if it doeasn't work at all.
That said, supporting clang does have few advantages
Note that, according to the current release schedules:
So we have to wait for 10.6.0-pre3 to propose the switch to llavm/clang 8. If we want to try it in CMSSW earlier, the two options are
sm_70
also for Turing);From a quick test, clang 7.0.1 doesn't seem to work with CUDA 10, but clang 8 (nightly) does.
I forgot that we have already backported preliminary support for CUDA 10 to the llvm/clang build in CMSSW.
There are some more bug fixes upstream that we may consider if we run into issues compiling CUDA code with clang:
It would be useful io document how to switch to clang as cuda compiler I have evidence that clang seems to outperform cicc for some specific, still relevant, code pattern (in the specific optimization of trivial (0,1) compiler time constant operands: see https://godbolt.org/z/CUjEIJ )