Open schweitzpgi opened 1 year ago
Thanks for all the details, @schweitzpgi :-)
Some comments w.r.t. the first bullet point.
It's fairly easy: we just need to construct a clang::driver::Command
and ask the clang::driver::Compilation
instance to execute it.
nvcc
"Fiddling around the nvcc
compiler, there is a feasible workflow to compile a single source file containing both __global__
and __qpu__
kernels, as follows:
(1) Perform CUDA compilation phase (--cuda
option)
nvcc -ccbin <host compiler, e.g., clang> -std=c++20 -I/repos/cuda-quantum/runtime -I /repos/cuda-quantum/tpls/fmt/include --cuda <source file>
We add CUDA Quantum compile options (C++20, include dirs, etc., as usual)
The result of this step is a .cpp.ii
file which has all the CUDA kernels (__global__
) resolved + fatbin embedded (intended for further processing by a host compiler).
(2) Perform usual nvq++
compilation of this file (e.g., with the bridge
)
(3) Add necessary link options (e.g., -lcudadevrt
, -lcudart_static
, etc.) for CUDA along our CUDA Quantum link flags.
Currently the nvq++ driver is a skeletal bash shell script that runs the various components that comprise the logical, piecewise steps of a nvq++ compilation. The bash script is very easy to update and experiment with, since a shell language is convenient for running subprocesses, assembling strings into command line option lists, etc.
There are alternatives:
Port the existing bash script to C++. nvq++ would become a more opaque process that drives the various components. This would have all the pros/cons of writing any sort of process manager in shell vs. C/C++, of course.
cudaq-quake
tool, but also includes the functionality that is found in both thecudaq-opt
andcudaq-translate
tools. This effectively merges the individual tools into a single "whole enchilada" ofcudaq-quake
+ everything else.Port the bash shell script to some other scripting language. This is the pick your favorite scripting language option.
Not really worth pursuing.
Graft all the nvq++ functionality into clang itself, renaming the clang executable, and using clang as a huge, jack-of-all-trades driver+compiler. Consequential drawbacks exist here for recurring labor and resource costs.
__global__
or__device__
), and CUDA Quantum kernels (__qpu__
).A mitigating factor is that nvq++ is already a C++ compiler, just with CUDA Quantum extensions (only), and dependent on LLVM, MLIR, and clang, though in a shrink wrapped sense, resp.,
cudaq-translate
+llc
,cudaq-opt
, andcudaq-quake
.clang++
supports plugins out of the boxclang++
with the extra command line arguments to add the plugin shared libraries, etc.Need to investigate if multiple plugins would be needed, if plugin dependent command-line options would be sufficient for a full
nvq++
implementation.