document how to use clang instead of NVCC

cms-patatrack / cmssw

CMSSW fork of the Patatrack project

https://patatrack.web.cern.ch/patatrack/index.html

Apache License 2.0

2 stars 5 forks source link

document how to use clang instead of NVCC #239

Open VinInn opened 5 years ago

VinInn commented 5 years ago

It would be useful io document how to switch to clang as cuda compiler I have evidence that clang seems to outperform cicc for some specific, still relevant, code pattern (in the specific optimization of trivial (0,1) compiler time constant operands: see https://godbolt.org/z/CUjEIJ )

fwyzard commented 5 years ago

Outside of CMSSW, you can just replace

CUDA_PATH=/usr/local/cuda-9.2
$CUDA_PATH/bin/nvcc -std=c++14 -O2 -g --generate-code arch=compute_70,code=sm_70 --expt-relaxed-constexpr test.cu -o test

with

CUDA_PATH=/usr/local/cuda-9.2
CLANG_CUDA_FLAGS=-x cuda --cuda-path=$CUDA_PATH -I$CUDA_PATH/include -L$CUDA_PATH/lib64/stubs -L$CUDA_PATH/lib64 -l cudart
clang++-7 $CLANG_CUDA_FLAGS -std=c++14 -O2 -g --cuda-gpu-arch=sm_70 test.cu -o test

From a quick test, clang 7.0.1 doesn't seem to work with CUDA 10, but clang 8 (nightly) does.

fwyzard commented 5 years ago

Within CMSSW, I expect this to be a bit more complicated because of

device code linking
SCRAM rules
clang++ vs g++

device code linking

The example above builds the CUDA part of the code as a single translation unit; what we use in CMSSW is to build separate .cu files independently, and then link them together, before linking with the host code. This seems to be supported starting from LLVM/clang 7 (and is the reason I pushed to have it in CMSSW), but the commands are a bit more complex.

SCRAM rules

Currently, the binary and flags used by SCRAM to build .cu files are mostly hard-coded in the SCRAM Makefile rules - and is already rather messy.
Adding support for building with clang++ -x cuda vs nvcc will not help cleaning it up...

clang++ vs g++

I have never tried building the host code with g++, the device code with clang, and linking them - so I have no idea if it is supported, unsupported but "just works", or if it doeasn't work at all.

fwyzard commented 5 years ago

That said, supporting clang does have few advantages

at least in some cases, clang seems to generate better code: testing the Eigen matrix functions, IIRC clang compiled the same C++ CUDA code in binaries that use a smaller number of registers and/or less memory;
clang has better error reporting, for example in the case of host/device function calls;
clang supports more recent C++ standards and all (most ?) of our host ROOT and CMSSW code.

fwyzard commented 5 years ago

Note that, according to the current release schedules:

CMSSW 10.5.0 is expected the 25th of February;
CMSSW 10.6.0-pre2 deadline is the 26th of February;
llvm/clang 8.0 will be tagged the 27th of February.

So we have to wait for 10.6.0-pre3 to propose the switch to llavm/clang 8. If we want to try it in CMSSW earlier, the two options are

revert to CUDA 9.2 (not a bit issue yet, since we have not tried using Graphs and we can stay with sm_70 also for Turing);
backport the support for CUDA 10 in the CMS branch of llvm/clang.

fwyzard commented 5 years ago

From a quick test, clang 7.0.1 doesn't seem to work with CUDA 10, but clang 8 (nightly) does.

I forgot that we have already backported preliminary support for CUDA 10 to the llvm/clang build in CMSSW.

There are some more bug fixes upstream that we may consider if we run into issues compiling CUDA code with clang: