Open doichanj opened 6 months ago
anyone trying this one? :)
So if I understand correctly, what is required is to implement additional set of MPS_Tensor
methods
https://github.com/Qiskit/qiskit-aer/blob/9bfd1105daa30debc1ce1773bf1d3c0457f2195d/src/simulators/matrix_product_state/matrix_product_state_tensor.hpp#L50
that utilize cuQuantum, similarly to, for example, tensor network contractor? I think it could be nice to discuss the details of architecture and design before starting tackling this issue.
For cuQuantum side, this example that uses cutensornetStateApplyTensorOperator
and cutensornetStateFinalizeMPS
probably should be a good place to start building the relevant routines.
I think Decompose
is one of the most time consuming kernel that can be accelerated by cuTensornet.
https://github.com/Qiskit/qiskit-aer/blob/9bfd1105daa30debc1ce1773bf1d3c0457f2195d/src/simulators/matrix_product_state/matrix_product_state_tensor.hpp#L593-L614
There is a wrapper implementation of SVD by using LAPACK in svd.cpp
I think we can add wrapper for GPU
@Randl , I was working on this issue., and svd is my target thing since last night. But, its true that I have to learn cuQuantum, and your previous comment shows you have a better idea of cuQuantum. Now, who's going to do the PR. Waiting for you comment :) I will do as you say :)
Go on, feel free to work on it if you're already in progress.
What is the expected behavior?
MPS simulation method (https://github.com/Qiskit/qiskit-aer/tree/main/src/simulators/matrix_product_state) becomes important to simulate circuits with large number of qubits. Currently MPS simulation method only supports
device=CPU
and it takes long time to simulate circuits with large number of qubits.cuQuantum (https://developer.nvidia.com/cuquantum-sdk) is a SDK for Quantum computing that accelerates simulation on NVIDIA's GPUs. cuQuantum has APIs for statevector simulation (cuStateVec) and tensor network simulation (cuTensorNet) on GPUs, and Aer currently supports cuQuantum in
method=statevector
andmethod=tensor_network
but we do not have for MPS method.MPS method can be accelerated by using cuTensorNet and there are some code examples here https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/examples.html