PennyLaneAI / qml

Introductions to key concepts in quantum programming, as well as tutorials and implementations from cutting-edge quantum computing research.
https://pennylane.ai/qml
Apache License 2.0
551 stars 188 forks source link

Functionality for parallel `qml.expval(H)` execution needed #508

Closed Qottmann closed 1 year ago

Qottmann commented 2 years ago

In the VQE with parallel QPUs on Rigetti Forest demo, deprecated qml.ExpvalCost functionality is used that makes use of internally parallelizing the executions with dask. Currently, something similar is not available with qml.expval(H), and, more importantly, not possible manually with user-facing functions and dask.

@antalszava and I concluded that there is no point to re-write the ExpvalCost logic with non-user-facing functions now, but we should instead find possibilities to provide parallelization possibilities with qml.expval(H) in the near future.

For now, we leave the demo as is with a warning about the deprecation, see https://github.com/PennyLaneAI/qml/pull/506

josh146 commented 2 years ago

@Qottmann I imagine the closest approach would be to do something like:

@qml.qnode(dev)
def circuit(x, h):
    qml.RX(x, wires=0)
    qml.RY(x * 2, wires=0)
    return qml.expval(h)

H = qml.PauliZ(0) + qml.PauliX(1)

results = [dask.delayed(circuit)(0.2, h) for h in H.ops]
results = H.coeffs @ dask.compute(*results, scheduler="threads")

Would this be sufficient in the tutorial? It avoids ExpvalCost, while making the Dask usage explicit.

Qottmann commented 2 years ago

In principle this would work, but do you see a way to incorporate the measurement optimization as well? Already in this example it is executing 2 expvals whereas only 1 is necessary. In many Hamiltonians you have big commuting Pauli groups, so executing all individual expvals in parallel is most likely slower (and wastes unnecessary hardware resources) than just default execution:

Zs = [qml.PauliZ(i) for i in range(10)]
Xs = [qml.PauliZ(i) for i in range(10)]
H = qml.Hamiltonian(coeffs = np.arange(20), observables = Zs + Xs, grouping_type="qwc")

results = [dask.delayed(circuit)(0.2, h) for h in H.ops]

>>> %timeit result = H.coeffs @ dask.compute(*results, scheduler="threads")
73.9 ms ± 5.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit circuit(0.2, H)
11.2 ms ± 289 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As far as I understand it this tutorial was intended for parallel hardware execution so I would try to make sure to incorporate measurement optimization.

Qottmann commented 2 years ago

Small note, the above example also does not distribute the computation to different devices but I think this can be done with this modification:

devs = [qml.device("default.qubit", wires=3) for _ in range(2)]

def circuit(x, h):
    qml.RX(x, wires=0)
    qml.RY(x * 2, wires=0)
    return qml.expval(h)

H = qml.PauliZ(0) + qml.PauliX(1)

results = [dask.delayed(qml.QNode(circuit, dev))(0.2, h) for h, dev in zip(H.ops, devs)]
results = H.coeffs @ dask.compute(*results, scheduler="threads")
josh146 commented 2 years ago

Oops! Yes, perfect @Qottmann :)

Qottmann commented 2 years ago

I created a PR for this here https://github.com/PennyLaneAI/qml/pull/510