Open bettinaheim opened 4 months ago
Of course, the moment I actually write down the exact repro, it occurs to me what is ultimately causing the issue is the missing MPI plugin:
Proceed as above, but then
export MPI_PATH=/usr/lib/x86_64-linux-gnu/openmpi
bash $CUDA_QUANTUM_PATH/distributed_interfaces/activate_custom_mpi.sh
nvq++ --target remote-mqpu /tmp/amplitude_estimation.cpp && ./a.out # now works (though why is it printing the llvm::dbgs() messages? - that's not nice...)
Edit nr 2: I quickly tried out if I at least get a decent/comprehensive error when I don't have MPI installed at all. Unfortunately, the compilation succeeds and I get pretty much the same error as above, which is not really comprehensive.
Options for resolution: 1) Require MPI to be installed to use the remote-mqpu backend. In that case, we need to document this and add a compilation check to give a nice comprehensive error along the lines of "This target requires MPI. Please install MPI and try again." when MPI is missing. 2) Not require MPI and do the same as we do for the nvidia-mqpu target. I think this in principle is what we do, and I think that is the better option.
Required prerequisites
Describe the bug
In some cases, the execution on the remote-mqpu backend fails with a JIT error along the lines of
JIT session error: Symbols not found: [ _Unwind_Resume, _ZNSaIcED2Ev, ...]
The error is caused by the invokeWrappedKernel logic in /runtime/common/JIT.cpp. Specifically, I think we are running into something like this: https://stackoverflow.com/questions/57612173/llvm-jit-symbols-not-found The _Unwind_Resume symbol is from the GNU C++ standard library, specifically from libsupc++.a. I double checked that the produced executable itself (a.out) contains that symbol, so I suspect it is indeed something about these lines that is not working as expected:
Steps to reproduce the bug
Minimal repro: Download the latest version of the CUDA Quantum installer for C++, or build it from source. Then run
The installer can be build from source by building the cuda-quantum-assets:
docker build -t cuda-quantum-assets:latest -f docker/build/assets.Dockerfile .
and then building the installer:DOCKER_BUILDKIT=1 docker build -f docker/release/installer.Dockerfile --build-arg base_image=cuda-quantum-assets:latest . --output out
Expected behavior
The example should compile and run without error.
Is this a regression? If it is, put the last known working version (or commit) here.
Not a regression
Environment
Suggestions
No response