Renmusxd / RustQIP

Quantum computing using rust. Efficient and a borrow-checked no cloning theorem!
https://docs.rs/qip/
MIT License
223 stars 18 forks source link

Faer rs Backend #47

Closed mert-kurttutan closed 10 months ago

mert-kurttutan commented 1 year ago

Hi,

Are you aware of Faer-rs crate ? It has very strong performance, on par with openblas, intel-mkl, without dealing with c/c++ bindings. It should provide great performance boost for appropriate fundamental operations, e.g. complex gemm (conjugate or not) One can start with including it slowly as an optional backend for some ops.

Renmusxd commented 1 year ago

Thanks for the interest. I actually don't use raw matrix multiplication for the bulk of the backend calculations. That's because most linalg libraries don't know how to exploit the tensor product structure of quantum circuits. By knowing how the matrices are constructed, as a tensor product of smaller matrices, we never need to construct the full matrix and can skip entries we know will be 0. This is faster even than sparse representations.

I should set up some benchmarks to explicitly compare this though.

mert-kurttutan commented 1 year ago

Yeah, I have not checked the codebase in super detail. Just tossing around this idea in case some of the ops can be formulated in a way that is relevant to faer rs

Renmusxd commented 1 year ago

Added some benchmarks across a few backends. Here I'm just testing a single 2x2 matrix applied to qubit 0 and identity applied elsewhere -- most benchmarks are on 12 qubits, the "large" ones are on 15 qubits. I test 4 backends: qip (this library), ndarray, sprs (sparse matrices), and faer.

test tests::bench_ones_faer_reuse_arena      ... bench:   6,446,267 ns/iter (+/- 47,841)
test tests::bench_ones_ndarray_build_each    ... bench: 142,738,247 ns/iter (+/- 980,752)
test tests::bench_ones_ndarray_reuse         ... bench:   4,600,275 ns/iter (+/- 404,325)
test tests::bench_ones_ndarray_reuse_arena   ... bench:   4,604,027 ns/iter (+/- 562,828)
test tests::bench_ones_qip                   ... bench:      58,430 ns/iter (+/- 1,756)
test tests::bench_ones_qip_large             ... bench:   1,289,297 ns/iter (+/- 23,239)
test tests::bench_ones_sprs_build_each       ... bench:     103,473 ns/iter (+/- 637)
test tests::bench_ones_sprs_build_each_large ... bench:  38,890,543 ns/iter (+/- 616,850)
test tests::bench_ones_sprs_reuse            ... bench:      17,702 ns/iter (+/- 205)
test tests::bench_ones_sprs_reuse_large      ... bench:   5,318,135 ns/iter (+/- 47,856)

It looks like faer isn't performing very well compared to the others so I'm worried I didn't write the benchmark properly, if you're familiar with the library I'm open to changes.

Renmusxd commented 10 months ago

Followup: After talking with the faer devs on their discord it seems it isn't really the best choice for backend. Their matrix on vector support is not prioritized and performs no better than using blas with 1 thread. I'll close this for now but keep the bench around.