A new StatevectorSampler times slower that the V1 implementation on a small number of qubits

What should we add?

The implementation of the Sampler primitive is significantly slower than the V1 implementation on a small number of qubits.

Here is a script:

import time

import matplotlib.pyplot as plt
import numpy as np
from qiskit import QuantumCircuit
from qiskit.circuit.library import EfficientSU2
from qiskit.primitives import StatevectorSampler, Sampler as RefSampler

def time_ref_v1_sampler(qc: QuantumCircuit, data: np.ndarray, retries=100):
    ref_sampler = RefSampler()
    start = time.time()
    for _ in range(retries):
        ref_sampler.run([qc] * len(data), data).result()
    elapsed = time.time() - start
    print(f"Reference V1 Sampler: {elapsed:0.2f} sec")
    return elapsed

def time_ref_v2_sampler(qc: QuantumCircuit, data: np.ndarray, retries=100):
    sampler_v2 = StatevectorSampler()
    pubs = [(qc, data[i, :]) for i in range(len(data))]

    start = time.time()
    for _ in range(retries):
        sampler_v2.run(pubs).result()
    elapsed = time.time() - start
    print(f"Reference V2 Sampler: {elapsed:0.2f} sec")
    return elapsed

def run_comparison():
    v1_elapsed = []
    v2_elapsed = []
    qubits = [5, 6, 7, 8, 9, 10]
    for num_q in qubits:
        qc = EfficientSU2(num_q)
        qc.measure_all()
        data = np.random.random((10, qc.num_parameters))

        v1 = time_ref_v1_sampler(qc, data)
        v1_elapsed.append(v1)
        v2 = time_ref_v2_sampler(qc, data)
        v2_elapsed.append(v2)

    plt.plot(qubits, v1_elapsed, label="v1")
    plt.plot(qubits, v2_elapsed, label="v2")
    plt.xlabel("Num qubits")
    plt.ylabel("Time")
    plt.legend()
    plt.show()

if __name__ == '__main__':
    run_comparison()

Here is the result on Mac.

On the larger numbers, e.g. 15 qubits, the results are comparable.

Qiskit / qiskit

A new StatevectorSampler times slower that the V1 implementation on a small number of qubits #12517

What should we add?