Parallelize `collapse` method for `lightning.qubit` with `OpenMP`

tomlqc commented 1 month ago

Important Note

⚠️ This issue is part of an internal assignment and not meant for external contributors

Context

Recently, Mid-Circuit Measurement (MCM) support was added to the lightning.qubit backend. The performance of the collapse method is very import for the MCM support. Parallelize the collapse method with OpenMP would boost the performance of MCM simulations using lightning.qubit.

Requirements

The collapse method with OpenMP support has to be implemented in the lightning.qubit C++ backend. For more information about the collapse method, please refer to Pennylane-Lightning source code.

Parallelize the collapse method with OpenMP in pennylane_lightning/core/src/simulators/lightning_qubit/StateVectorLQubit.hpp
The new implementation would allow users to choose whether the collapse method is built against OpenMP or not.
Create a pull-request in the PennyLane Lightning repository and ensure to complete all the steps outlined in the PR template.
Benchmark the new C++ implementation (depending on Num_Qubits and OpenMP threads number) and upload the results to the pull-request for further discussions.
Mark the PR ready for review.

Don't hesitate to ask for clarification or raise any concerns regarding the issue. We'll be happy to discuss with you!

xiaohanzai commented 3 weeks ago

Hi Thomas, for the benchmarking, should I just take the collapse function out and run it on my own laptop? I guess I don't have to install the whole package right? What's the possible range of values for Num_Qubits?

tomlqc commented 3 weeks ago

Hi Xiaohan, yes, you may just benchmark collapse() separately, and please share with us how you did it. The upper limit for the number of qubits will be your laptop's memory, you can start with 10. And don't forget to compare different number of threads :slightly_smiling_face:

xiaohanzai commented 3 weeks ago

Thanks! And I guess wire can be anything that satisfies (1+wire) < getNumQubits()?

xiaohanzai commented 3 weeks ago

Hi Thomas, looks like I can't push my changes. Should I be added as a collaborator first?

tomlqc commented 3 weeks ago

Hi @xiaohanzai, you will have to create a fork of the repository, where you can push to, and then you can create a PR to merge your branch to PennyLaneAI:master.

xiaohanzai commented 3 weeks ago

Hi Thomas, a few more questions before I submit for pull request...

How many threads do you usually use, and do you expect to see good performance with a lot of threads? I tested with number of qubits from 20 to 30 on a cluster at UofT, and there's not much improvements in performance beyond 8 threads. I'm considering cache misses and false sharing etc. but I'm not sure if I'm expected to make things work perfect with large numbers of threads.
Do you expect to see good performance on a laptop? Because on my mac the scaling seems pretty bad actually, but on the UofT cluster the scaling is a lot better.
For the testing mentioned in PR, do I need to modify any of the py files in tests/lightning_qubits to test the implementation?

maliasadi commented 2 weeks ago

Hi @xiaohanzai, Thank you for your clear and detailed communication!

There's no need for additional effort in optimizing this specifically for HPC machines. The number of threads required to achieve optimal performance generally depends on several factors, particularly the complexity of your example and the number of physical threads available on your machine. We would be happy to review your pull request.
There are differences in the gate kernels between macOS and Linux, so we expect some performance differences between the two platforms. For this project, sharing the results from your Mac laptop will be perfectly fine.
Yes, please add unit tests to check the correctness of your changes in ./tests/test_measurements.py.

xiaohanzai commented 2 weeks ago

Hi @maliasadi , thank you so much for your reply! Sorry I'm still quite confused about adding the tests. Should I add a test in test_measurements.py or maybe a cpp file under pennylane_lightning/core/src/simulators/lightning_qubit/tests/? I actually took the collapse function out and did a scaling test for it individually without compiling the whole pennylane package, so I'm not sure if I should put that file in the repo. Or should I just submit my scaling test code only for the PR, not putting it as a test module?

I think because I took the collapse function out to the scaling test instead of compiling the whole pennylane package with a test, I'm getting quite confused what I should do right now...

maliasadi commented 2 weeks ago

@xiaohanzai No need to update the Pytohn tests for now! Please go ahead and create the pull request with your changes. Feel free to include any additional benchmark scripts and filed to the PR. We'd be happy to review your code and continue the discussion there!

xiaohanzai commented 2 weeks ago

Thanks! I just created a pull request. There doesn't seem to be option to put it ready for review though...

PennyLaneAI / pennylane-lightning