Closed vincentmr closed 2 months ago
Attention: Patch coverage is 88.63636%
with 10 lines
in your changes missing coverage. Please review.
Project coverage is 97.29%. Comparing base (
d5ffb0c
) to head (391d0da
). Report is 1 commits behind head on master.
Files with missing lines | Patch % | Lines |
---|---|---|
...nylane_lightning/lightning_kokkos/_state_vector.py | 0.00% | 6 Missing :warning: |
...ane_lightning/lightning_kokkos/lightning_kokkos.py | 20.00% | 4 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Before submitting
Please complete the following checklist when submitting a PR:
[x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the
tests
directory![x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running
make docs
.[x] Ensure that the test suite passes, by running
make test
.[x] Add a new entry to the
.github/CHANGELOG.md
file, summarizing the change, and including a link back to the PR.[x] Ensure that code is properly formatted by running
make format
.When all the above are checked, delete everything above the dashed line and fill in the pull request template.
Context: Pauli rotations come up in many places, and importantly in the time evolution of qchem Hamiltonians. It is therefore worth considering ways to accelerate their execution.
Description of the Change: Implement
applyPauliRot
. InvokeapplyPauliRot
directly from the SV class and add bindings to the Python layer.Benefits: Faster Pauli rotations. I performed a benchmark on random
PauliRotation
s (runtime > 1.0 sec and at least 5 of them) through the Python layer. The data remains noisy with 5 samples because the performance varies depending on the specific "XYZ" sequence (which translates into more or less predictable memory access patterns). Overall, we see an advantage for 3+ qubits and up.I performed the same benchmark on an A100 card with the Kokkos-CUDA backend, but using at least 500 samples since the absolute timings quite small and get the following speed-ups.
Using a full workflow such as
to benchmark, we obtain timings as follows
For large enough molecules (>= 20 qubits, >= 1000 terms), the new PauliRot kernels have a clear advantage which only grows with molecular size. It is worth noting that with L-Kokkos-CUDA, even at the (24/10k) scale, evaluating the circuit is not the main bottleneck which is why it takes about the same time simulating HCN (2.64 sec.
apply_lightning
vs 32.5 sec.QNode
) and N2N2 (7.51 sec.apply_lightning
vs 36.4 sec.QNode
).Possible Drawbacks:
Related GitHub Issues: [sc-69801]