PennyLaneAI / pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
https://docs.pennylane.ai/projects/lightning
Apache License 2.0
86 stars 35 forks source link

Add native PauliRot implementation in LightningKokkos [sc-71642] #855

Closed vincentmr closed 3 weeks ago

vincentmr commented 1 month ago

Before submitting

Please complete the following checklist when submitting a PR:

When all the above are checked, delete everything above the dashed line and fill in the pull request template.


Context: Pauli rotations come up in many places, and importantly in the time evolution of qchem Hamiltonians. It is therefore worth considering ways to accelerate their execution.

Description of the Change: Implement applyPauliRot. Invoke applyPauliRot directly from the SV class and add bindings to the Python layer.

Benefits: Faster Pauli rotations. I performed a benchmark on random PauliRotations (runtime > 1.0 sec and at least 5 of them) through the Python layer. The data remains noisy with 5 samples because the performance varies depending on the specific "XYZ" sequence (which translates into more or less predictable memory access patterns). Overall, we see an advantage for 3+ qubits and up.

speedup_vs_ntargets_lk_omp16

I performed the same benchmark on an A100 card with the Kokkos-CUDA backend, but using at least 500 samples since the absolute timings quite small and get the following speed-ups.

speedup_vs_ntargets_lk_cuda

Using a full workflow such as

    @qml.qnode(dev, diff_method=None)
    def circuit():
        qml.TrotterProduct(ham, time=1.0, n=1, order=2)
        return qml.state()

to benchmark, we obtain timings as follows

time_vs_mol

For large enough molecules (>= 20 qubits, >= 1000 terms), the new PauliRot kernels have a clear advantage which only grows with molecular size. It is worth noting that with L-Kokkos-CUDA, even at the (24/10k) scale, evaluating the circuit is not the main bottleneck which is why it takes about the same time simulating HCN (2.64 sec. apply_lightning vs 32.5 sec. QNode) and N2N2 (7.51 sec. apply_lightning vs 36.4 sec. QNode).

Possible Drawbacks:

Related GitHub Issues: [sc-69801]

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 88.63636% with 10 lines in your changes missing coverage. Please review.

Project coverage is 97.29%. Comparing base (d5ffb0c) to head (391d0da). Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...nylane_lightning/lightning_kokkos/_state_vector.py 0.00% 6 Missing :warning:
...ane_lightning/lightning_kokkos/lightning_kokkos.py 20.00% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #855 +/- ## ========================================== + Coverage 96.24% 97.29% +1.04% ========================================== Files 212 168 -44 Lines 28109 21118 -6991 ========================================== - Hits 27054 20547 -6507 + Misses 1055 571 -484 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.