PabloAndresCQ commented 6 months ago

This PR adds support for MPS simulation on PECOS via pytket-cutensornet>=0.7.0. Support is added for:

Simulation via QuantumCircuit.
Simulation via HybridEngine, so it accepts PHIR input. It works with error models.

This will remain a draft PR for a while until I polish up the following:

[x] Provide support for two-qubit gates acting on non-adjacent qubits.
[x] Add tests on HybridEngine using MPS.
[x] Explore options to reduce dependency constraints with pytket.
- This can be done, all dependencies to pytket in the code used by this PR could be removed.
- We are likely to create a C version of the pytket-cutensornet primitives used in this PR, and when doing so, it'll be natural not to make it dependent from TKET. I believe it is reasonable to wait until then to lift the tket dependencies, to avoid duplication of work.
- For now, pytket dependencies are managed by pip when installing pytket-cutensornet. When we eventually remove the dependencies in pytket-cutensornet, no additional change will be required on the side of PECOS.
[x] Investigate bottlenecks/scaling. In particular, do some tests with error models.
- The code in this PR does not seem to add additional overheads.
- For small circuits, the bottlenecks on MPS come from pytket-cutensornet and, in particular, on the Python implementation of canonicalise and sample. Jake is considering the possibility of lowering this primitives to C to reduce the bottleneck.
- For larger circuits, we seem to get the same behaviour as pytket-cutensornet on its own, that is: among the best when compared to publicly available libraries (see Confluence pages: vs CPU libraries, vs GPU libraries). Benchmarking againts proprietary software (e.g. FermioniQ) has not been done yet.
- I have not yet done benchmarking using the accurate H-series error model. This is an interesting research question, to see how much the presence of noise facilitates simulability. Improvements in the algorithm informed by this benchmarking will be done in pytket-cutensornet and won't affect the code in this PR, so the PR can be merged now.

What to expect in terms of runtime

See dropdown details below:

MPS is slower than statevector methods for small circuit (and it will always be)

> Circuits such as the one in `tests/integration/phir/example1_no_wasm.json` which has 4 qubits and 21 gates (2 of them 2-qubit gates) and 16 measurements takes approximately 85ms to run per shot. So, for 1000 shots it takes about 90 seconds, which is considerably more than `cuStateVec` (around 5 seconds) or `ProjectQ` (around 3 seconds). I believe the main factors are the following: > > - GPUs are slower than CPUs for small circuits. This is a trend that can be consistently seen across statevector and TN simulators: you only get advantage from GPUs when you have amortized their overheads, which can't happen if the circuits are too easy. > - Similarly, the MPS algorithm is considerably more complex than statevector and, hence, it has its own overheads that are only amortized if the simulation is "challenging". Moreover, in these small circuits the overheads of the Python implementation of pytket-cutensornet clearly show, since most of the time is spent in `canonicalise`, which is the most Python-heavy part of the algorithm.

MPS can run circuits with high qubit number

> I have been able to run examples from a suite of 56 qubit circuits for Hamiltonian simulation on square lattices. Runtime matches the results reported in these Confluence pages: [vs CPU libraries](https://cqc.atlassian.net/wiki/spaces/TKET/pages/2791374886/Benchmarking+MPS+against+ITensors+and+Quimb), [vs GPU libraries](https://cqc.atlassian.net/wiki/spaces/TKET/pages/2973958195/Benchmarking+MPS+against+ITensorGPU+and+cuTensorNet). > > **Note**: Runtime strongly depends on the chosen truncation parameters for MPS and varies across circuits. In the circuit suite discussed above there are some circuits that could be simulated with acceptable fidelity ~1e-3 error per gate in 300 seconds (single shot), and others that took two hours and returned a completely noisy state.

Further work (to think about after merging)

It will be important to add heuristics to QuantumSimulator to choose when to use TN-based algorithms and when to use statevector, since the points above will qualitatively apply no matter the implementation.
When using TN-based algorithms, setting the truncation parameters will be a subtle matter, and we should discuss whether this is something that we should expect the users to choose, or if we should try to make our own guesses.
MPS is a terrible choice for simulating all-to-all connectivity circuits. I am developing an alternative (TreeTN) which I think should do better, but this is still in an early stage. Other TN approaches should also be considered.

ciaranra commented 3 months ago

Everything looks great.

The one thing that makes me wonder is the inclusion of pytket-cutensornet in the simulators target as that will cause systems that don't have access to cuquantum to pull down extra packages. I am wondering this sort of thing coupled with the fact that cuquantum can't be installed via pip... if we should rethink installation extras or even get creative and consider modern package management things like pixi or rye (or maybe poetry) to handle OS dependent installation...

To keep things simple for now... maybe we should have extras like simulators, simulators-linux, all, all-linux, or something...

ciaranra commented 3 months ago

Additionally, as far as lowering to C, I have been considering lowering the circuits to Rust given I am working on Rust simulators... then I can run run everything in pure Rust. Perhaps, that is related to your comments above. But whether things are lowered to Rust or C... we could make use of running things using the C ABI.

PabloAndresCQ commented 3 months ago

The one thing that makes me wonder is the inclusion of pytket-cutensornet in the simulators target as that will cause systems that don't have access to cuquantum to pull down extra packages. [...] To keep things simple for now... maybe we should have extras like simulators, simulators-linux, all, all-linux, or something...

Fair point! I've moved pytket-cutensornet intot the [cuda] group, since its dependencies are there anyway.

Regarding your point about lowering to C or Rust. Most likely we will want to lower to C just because cuTensorNet is a C/CUDA library. But, indeed, cooperation between Rust and C shouldn't be problematic afaik.

PECOS-packages / PECOS

[feature] Adding support for MPS simulation #63

What to expect in terms of runtime

Further work (to think about after merging)