[RFC] Reimplement the nvq++ driver

Currently the nvq++ driver is a skeletal bash shell script that runs the various components that comprise the logical, piecewise steps of a nvq++ compilation. The bash script is very easy to update and experiment with, since a shell language is convenient for running subprocesses, assembling strings into command line option lists, etc.

There are alternatives:

Port the existing bash script to C++. nvq++ would become a more opaque process that drives the various components. This would have all the pros/cons of writing any sort of process manager in shell vs. C/C++, of course.
- The option prototyped in #600 and explained in #650 riffs on this general idea. That implementation has a C++ program that wraps the clang compiler much like the cudaq-quake tool, but also includes the functionality that is found in both the cudaq-opt and cudaq-translate tools. This effectively merges the individual tools into a single "whole enchilada" of cudaq-quake + everything else.
- Pros:
  - better end-to-end performance (see #600 for some numbers)
- Cons:
  - prototype needs to splice in some additional functionality
- Open:
  - how easy is it to drive other subprocesses? this question is relevant for linking and perhaps running nvcc.
Port the bash shell script to some other scripting language. This is the pick your favorite scripting language option.
- Pros:
- can use your favorite scripting language
- Cons:
- just treading water in terms of the project
Not really worth pursuing.
Graft all the nvq++ functionality into clang itself, renaming the clang executable, and using clang as a huge, jack-of-all-trades driver+compiler. Consequential drawbacks exist here for recurring labor and resource costs.
- Pros:
  - Re-use the clang driver command line parsing and processing
  - Some flexibility to choreograph different executables as subprocesses which may be very useful eventually when compiling code that is C++, CUDA kernels (__global__ or __device__), and CUDA Quantum kernels (__qpu__).
  - Could hook into the clang++ "lowering" directly (like the ClangIR project). This would potentially eliminate extra traversals of the clang AST and make the compilation faster. (See also cons.)
- Cons:
  - The high buy-in cost of using clang as a driver. With clang, the driver and the C, C++, objective-C, ... compilers are hopelessly(?) intertwined. This means a new compiler can only be additive to the whole and not use only what it needs.
  - nvq++ builds would become dependent on a modified clang source code, requiring building this "clang+1" as well as the nvq++ project itself. (It eliminates grabbing a development build via apt-get, for example.)
  - Any and all patches on top of clang would have to be maintained downstream and could be broken upstream at any time.
  - The clang driver+compiler doesn't do MLIR, though other projects may add hooks in the same places for alternative reasons at some point. (See also pros.)
A mitigating factor is that nvq++ is already a C++ compiler, just with CUDA Quantum extensions (only), and dependent on LLVM, MLIR, and clang, though in a shrink wrapped sense, resp., cudaq-translate + llc, cudaq-opt, and cudaq-quake.
- Convert the nvq++ functionality (cudaq-quake bridge, cudaq-opt passes, cudaq-translate code generators) into a (set of) plugin(s). This would avoid tight integration issues. It would likely still require some sort of "wrapper script" to alleviate the end user from having to type plugin .so files and plugin options.
- Pros:
  - clang++ supports plugins out of the box
  - a plugin can be placed as an inline clang AST traversal, which may be good in terms of performance (to be determined)
- Cons:
  - Still need a "thin veneer" sort of wrapper that would launch off-the-shelf clang++ with the extra command line arguments to add the plugin shared libraries, etc.
Need to investigate if multiple plugins would be needed, if plugin dependent command-line options would be sufficient for a full nvq++ implementation.

NVIDIA / cuda-quantum

[RFC] Reimplement the nvq++ driver #505