Closed PabloAndresCQ closed 3 months ago
Everything looks great.
The one thing that makes me wonder is the inclusion of pytket-cutensornet
in the simulators
target as that will cause systems that don't have access to cuquantum
to pull down extra packages. I am wondering this sort of thing coupled with the fact that cuquantum
can't be installed via pip
... if we should rethink installation extras or even get creative and consider modern package management things like pixi
or rye
(or maybe poetry) to handle OS dependent installation...
To keep things simple for now... maybe we should have extras like simulators
, simulators-linux
, all
, all-linux
, or something...
Additionally, as far as lowering to C, I have been considering lowering the circuits to Rust given I am working on Rust simulators... then I can run run everything in pure Rust. Perhaps, that is related to your comments above. But whether things are lowered to Rust or C... we could make use of running things using the C ABI.
The one thing that makes me wonder is the inclusion of pytket-cutensornet in the simulators target as that will cause systems that don't have access to cuquantum to pull down extra packages. [...] To keep things simple for now... maybe we should have extras like simulators, simulators-linux, all, all-linux, or something...
Fair point! I've moved pytket-cutensornet
intot the [cuda]
group, since its dependencies are there anyway.
Regarding your point about lowering to C or Rust. Most likely we will want to lower to C just because cuTensorNet is a C/CUDA library. But, indeed, cooperation between Rust and C shouldn't be problematic afaik.
This PR adds support for MPS simulation on PECOS via pytket-cutensornet>=0.7.0. Support is added for:
QuantumCircuit
.HybridEngine
, so it accepts PHIR input. It works with error models.This will remain a draft PR for a while until I polish up the following:
canonicalise
andsample
. Jake is considering the possibility of lowering this primitives to C to reduce the bottleneck.What to expect in terms of runtime
See dropdown details below:
MPS is slower than statevector methods for small circuit (and it will always be)
> Circuits such as the one in `tests/integration/phir/example1_no_wasm.json` which has 4 qubits and 21 gates (2 of them 2-qubit gates) and 16 measurements takes approximately 85ms to run per shot. So, for 1000 shots it takes about 90 seconds, which is considerably more than `cuStateVec` (around 5 seconds) or `ProjectQ` (around 3 seconds). I believe the main factors are the following: > > - GPUs are slower than CPUs for small circuits. This is a trend that can be consistently seen across statevector and TN simulators: you only get advantage from GPUs when you have amortized their overheads, which can't happen if the circuits are too easy. > - Similarly, the MPS algorithm is considerably more complex than statevector and, hence, it has its own overheads that are only amortized if the simulation is "challenging". Moreover, in these small circuits the overheads of the Python implementation of pytket-cutensornet clearly show, since most of the time is spent in `canonicalise`, which is the most Python-heavy part of the algorithm.MPS can run circuits with high qubit number
> I have been able to run examples from a suite of 56 qubit circuits for Hamiltonian simulation on square lattices. Runtime matches the results reported in these Confluence pages: [vs CPU libraries](https://cqc.atlassian.net/wiki/spaces/TKET/pages/2791374886/Benchmarking+MPS+against+ITensors+and+Quimb), [vs GPU libraries](https://cqc.atlassian.net/wiki/spaces/TKET/pages/2973958195/Benchmarking+MPS+against+ITensorGPU+and+cuTensorNet). > > **Note**: Runtime strongly depends on the chosen truncation parameters for MPS and varies across circuits. In the circuit suite discussed above there are some circuits that could be simulated with acceptable fidelity ~1e-3 error per gate in 300 seconds (single shot), and others that took two hours and returned a completely noisy state.Further work (to think about after merging)
QuantumSimulator
to choose when to use TN-based algorithms and when to use statevector, since the points above will qualitatively apply no matter the implementation.