Qiskit / qiskit

Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
https://www.ibm.com/quantum/qiskit
Apache License 2.0
5.24k stars 2.36k forks source link

How to parrallelize computation of expectation values in Qiskit #12956

Open jwtkeeble opened 2 months ago

jwtkeeble commented 2 months ago

What should we add?

I'm trying to compute the expectation values of a large set of Pauli strings, and it's taking way too long on the CPU. I was wondering there's an in-built way to parrallelize taking the expectation values over a large list?

My current solution is to use the joblib library and split by array into nthreads subsets and run each subset on a single thread on my CPU (out of a total of nthreads). However, I only see a benefit at around nqubits = 10 and higher, which I assume this is due to some form of overhead when assigning the subsets to each CPU thread (and a trade-off between a nthreads CPU working independently and all nthreads CPUs working on the same expectation value).

Is there an in-built to parrallelize this operation? Or perhaps is there a way to parrallelize this for-loop over the GPU without using an external library? I understand in other libraries (e.g. JAX or PyTorch), there exist in-built vectorization functions, e.g. jax.vmap or torch.vmap that allow for a vectorized approach to iterables. Is there something equivalent in Qiskit?

Any help would be greatly appreciated!

I've attached an example script below to demonstrate this behaviour, and I've attached an example output at the bottom of this issue.

from qiskit.circuit import QuantumCircuit
from qiskit.quantum_info import random_statevector, Pauli

from itertools import product
from joblib import Parallel, delayed
import numpy as np
from time import perf_counter
from concurrent.futures import ThreadPoolExecutor

#single for-loop
def probs_IZs_single_loop(outputstate, strings):
    return np.array([abs(outputstate.expectation_value(Pauli(string)))**2 for string in strings])

#run 'single for-loop' method over subsets of all strings assigned to each thread
def probs_IZs_multi(outputstate, strings):
    results = Parallel(n_jobs=-1, verbose=0)(delayed(probs_IZs_single_loop)(outputstate, subset_strings) for subset_strings in np.array_split(strings, nthreads)) 
    return np.concatenate(results, axis=0)

nthreads = 16
for nqubits in np.arange(1,17,1):
    qc = QuantumCircuit(nqubits)        #Init. random QC and statevector 
    wf = random_statevector(2**nqubits)
    qc.initialize(wf, qc.qubits)

    #NOTE: Example of summing of a set of Pauli strings
    pauli_group = ['I','Z']
    strings = np.array([''.join(i) for i in product(pauli_group, repeat=wf.num_qubits)])

    #List-comp example
    t0=perf_counter()
    all_probs_single = probs_IZs_single_loop(outputstate=wf, strings=strings)
    t1=perf_counter()
    time_single = t1-t0
    all_probs_single = np.array(all_probs_single)

    #parrallelized List-comp (via Joblib)
    t0=perf_counter()
    all_probs_multi = probs_IZs_multi(outputstate=wf, strings=strings)
    t1=perf_counter()
    time_multi = t1-t0
    all_probs_multi = np.array(all_probs_multi)

    print(f'Qubits: {nqubits:>3d} | Single: {time_single:>4.4f}s | Multi: {time_multi:>4.4f}s | Match?: {np.allclose(all_probs_single, all_probs_multi)}')

The output of this script is here:

 Qubits:   1 | Single: 0.0004s | Multi: 0.8664s | Match?: True
Qubits:   2 | Single: 0.0009s | Multi: 0.2126s | Match?: True
Qubits:   3 | Single: 0.0006s | Multi: 0.0120s | Match?: True
Qubits:   4 | Single: 0.0007s | Multi: 0.0118s | Match?: True
Qubits:   5 | Single: 0.0015s | Multi: 0.0115s | Match?: True
Qubits:   6 | Single: 0.0011s | Multi: 0.0108s | Match?: True
Qubits:   7 | Single: 0.0043s | Multi: 0.0108s | Match?: True
Qubits:   8 | Single: 0.0056s | Multi: 0.0108s | Match?: True
Qubits:   9 | Single: 0.0096s | Multi: 0.0108s | Match?: True
Qubits:  10 | Single: 0.0441s | Multi: 0.0111s | Match?: True
Qubits:  11 | Single: 0.0407s | Multi: 0.0210s | Match?: True
Qubits:  12 | Single: 0.0849s | Multi: 0.0309s | Match?: True
Qubits:  13 | Single: 0.2010s | Multi: 0.0617s | Match?: True
Qubits:  14 | Single: 0.5576s | Multi: 0.1121s | Match?: True
Qubits:  15 | Single: 1.8150s | Multi: 0.3363s | Match?: True
Qubits:  16 | Single: 6.2523s | Multi: 1.3145s | Match?: True
jsaroni commented 2 months ago

Have you tried reconstructing the expectation values using parallelization from https://qiskit.github.io/qiskit-aer/stubs/qiskit_aer.AerSimulator.html

Also I'm curious, have you checked this with the EstimatorV2 Primitive?

jwtkeeble commented 2 months ago

Hi @jsaroni, thanks for the response! I'm relatively new to Qiskit, so I'm not that well versed in the Qiskit framework. Do I just have to change the backend of qiskit_aer to parrallelize expectation values?

I did see that you can represent the wavefunction as a vector (and use something like np.einsum) to compute multiple expectations, but that's intractable for any reasonable number of qubits.

If you're aware of any minimal reproducible example that can compute expectation values, please do let me know!