Open leofang opened 1 year ago
cc: @tlubowe @yangcal for vis
(edited to add a CI/CD concern)
Qiskit Aer uses all the GPUs specified in CUDA_VISIBLE_DEVICES
environmental variable. Is it not enough to limit Qiskit Aer to use some of the available GPUs?
Hi @doichanj, unfortunately it is not enough, and is not a preferred solution either. CUDA_VISIBLE_DEVICES
is a brute-force solution that should only be used when users know exactly what they're doing (usually, HPC users; Dask also uses this internally to do GPU-process binding), but it's not meant for general users, and it certainly does not what usual Python users would do to launch a process.
Typically, Python GPU users expect to choose the GPU at runtime. There are a number of framework specific options:
cupy.cuda.Device(0)
torch.device('cuda:0')
tensorflow.device('/device:GPU:0')
cud.cudart.cudaSetDevice(0)
and these should be honored based on the CUDA Programming Model (the CUDA Runtime APIs would honor the current/active CUDA context).
Moreover, as described in my report, this impacts even the single-GPU users, who might only want to run the CPU backend via AerSimulator(..., device='CPU', ...)
for any reason. At least this is how we discovered this bug 🙂 There are many users who just wanna install the battery-included GPU build and pick among all available backends to tailor for their need, and
AerSimulator(..., device='GPU', ...)
is called.Finally, my above report also listed a number of other impacts, one being import qiskit_aer
or even just print(qiskit.__qiskit_version__)
would prematurely initializing GPUs. This impacts for example @wshanks who I just noticed is packaging Qiskit Aer on conda-forge (see https://github.com/conda-forge/staged-recipes/pull/21404#issuecomment-1361822587) 😅
@doichanj, is fixing this any priority? Who would we have to convince to make this a priority?
I would really like to get CUDA support in the conda packages 😄
I did not understand the point of this issue, but I have implemented target_gpus
option to select GPUs to be used for simulation. But I think this is not the solution for this issue, right?
I think we have to change the way to get available devices and methods to avoid initializing GPUs when using CPU simulator, is it what you want?
I had high priority task to release Aer 0.13.1, but I have time to solve this issue now
I think we have to change the way to get available devices and methods to avoid initializing GPUs when using CPU simulator, is it what you want?
Thanks, @doichanj. It is correct. Since we have all the knowledge at compile time (we know what compiler flags are set to build what backends etc), we can just store them as static, readonly arrays, and at run time we query them to see if a backend was built, without ever needing to initialize a CUDA context or calling CUDA APIs 🙂 I'd love to see this fixed asap, as to avoid this issue on the packaging side would take a lot of unnecessary efforts. Also, this has generated multiple bug reports on our side (as we have internal/external multi-GPU users).
CUDA_VISIBLE_DEVICES
is a brute-force solution that should only be used when users know exactly what they're doing (usually, HPC users; Dask also uses this internally to do GPU-process binding), but it's not meant for general users, and it certainly does not what usual Python users would do to launch a process.Typically, Python GPU users expect to choose the GPU at runtime. There are a number of framework specific options:
* CuPy: `cupy.cuda.Device(0)` * PyTorch: `torch.device('cuda:0')` * TensorFlow: `tensorflow.device('/device:GPU:0')` * CUDA Python (CUDA's official Python binding): `cud.cudart.cudaSetDevice(0)`
and these should be honored based on the CUDA Programming Model (the CUDA Runtime APIs would honor the current/active CUDA context).
Hello eveyone, I was wondering whether the functionality of setting the visible devices at runtime has been actually implemented. I am trying to instantiate multiple python subprocesses through an orchestrator process, and assign a GPU to each subprocess. My goal is to leverage multiple GPUs to independently run in parallel different quantum circuits with different NoiseModels.
However, I am in no luck, as I tried both setting the visible device through cupy.cuda.Device(rank) and the CUDA_VISIBLE_DEVICES environment variable, without any success. At the moment, I have only seen some marginal performance improvement when using batched_shots_gpu=True, and batched_shots_gpu_max_qubits=30 on NVIDIA A100 GPUs. Do you have any advice or suggestion for achieving this per-subprocess independent GPU visibility?
Please use option target_gpus
to specify which GPUs to be used for simulation
https://github.com/Qiskit/qiskit-aer/blob/61a557f2bbc62a7942e7eda8da0bff8bcbaa209e/qiskit_aer/backends/aer_simulator.py#L175-L178
Informations
What is the current behavior?
Importing Qiskit Aer either implicitly or explicitly, as shown below, would get all GPUs on the system initialized, as evidenced by monitoring
nvidia-smi
(there are other tools to check this, butnvidia-smi
is the simplest).Steps to reproduce the problem
qiskit-aer-gpu
from PyPI (or build from source; how it's installed is irrelevant as long as the CUDA support is built)python -i -c "import qiskit_aer"
python -i -c "import qiskit.providers.aer"
python -i -c "from qiskit.providers.aer import AerSimulator"
python -i -c "import qiskit; print(qiskit.__qiskit_version__)"
-i
), checknvidia-smi
. On a multi-GPU system, it is clear that the CUDA context is initialized on all GPUs:+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 316201 C python 264MiB | | 1 N/A N/A 316201 C python 428MiB | | 2 N/A N/A 316201 C python 428MiB | | 3 N/A N/A 316201 C python 264MiB | +-----------------------------------------------------------------------------+