Closed dotslaser closed 1 year ago
I am interested in knowing if the problem has been solved. @doichanj
I have not been able to reproduce this issue in my environment (Power9 + IBM Spectral MPI) I think this issue is depending on the MPI build and maybe related to GPU direct RDMA. Please make sure if the MPI implementation supports RDMA and try using MPI's options to enable RDMA.
Let me close this issue because of no response in more than two weeks. Please create a new issue when this issue should be fixed in your environment.
Informations
What is the current behavior?
Segmentation error when running a quantum circuit on GPU with multiple processes (using MPI). I found a partial (but annoying) solution to this problem:
Steps to reproduce the problem
I'm using NVIDIA GPUs in AWS (Amazon Web Services) instances. This are the system specs:
AWS specs: g5.xlarge instances
4 vCPUs (AMD EPYC 7R32)
NVIDIA A10G Tensor Core (24 GB)
16 GB RAM
NVIDIA CUDA specifications:
NVIDIA CUDA installation steps:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda
Qiskit Aer Compilation:
python ./setup.py bdist_wheel -- -DAER_MPI=True -DAER_THRUST_BACKEND=CUDA
Simple Quantum Circuit for tests:
This circuit has only Hadamard gates.
Test 1 (2 processes):
Code:
Execution:
Result:
If I execute the exact same code in 2 machines (same AWS instances, each one process instead of 2):
Execution:
Result:
Test 2 (same number of qubits and same circuit as the previous case, except now the last qubit does not have a Hadamard gate):
Code:
Execution(same execution):
Result:
Now it gives the expected result (output of 2 processes). It also works if the 2 processes are performed by one instance instead of 2.
Test 3 (now 4 processes, all qubits with Hadamard gate):
Code:
Execution:
Result:
Test 4 (4 processes, all qubits except the last with Hadamard gate):
Code:
Execution:
Result: Segmentation fault (similar error to previous cases, like in test 1)
Test 5 (4 processes, all qubits except the last 2 with Hadamard gate):
Code:
Execution:
Result:
Something similar happens with 8, 16, etc. processes.
What is the expected behavior?
It should work with all qubits performing operations, without leaving "blank" qubits.
Suggested solutions
I think Qiskit Aer is not managing correctly the memory, but I don't know exactly the cause of error.
If you can't replicate this error, please share your hardware setup and installation process. I would really appreciate it!
Thanks a lot for your help!! :)