Bug that prevented some gates to work with more than 32 qubits and resulted in segfaults. This one was caused by the incorrect type casting between 32-bit integers and 64-bit integers during the bit-wise operations. Basically, any invocation of the bit shift with uint64_t x; was changed from (1 << x) to (1UL << x).
Performance improvements:
Performance was measured on Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz, 16 cores (multithreading is off), 93GB RAM:
This PR fixes a few significant bugs and drastically improves the performance of the code.
Fixed bugs:
zero_worker
introduced in https://github.com/QE-Lab/qx-simulator/commit/923b194e616e2924393b3c8b8bddec2b57e64854#diff-71447ae75943cb660ee4aac230d56c16ff7ab3066dd4dbc9da302a36c99aad7cR2955. Two branches of the if-statement had to be swapped.p1_worker
, disclosed in https://github.com/QE-Lab/qx-simulator/commit/1f06044e82ed742104b85e756db579eeafe75afd#diff-71447ae75943cb660ee4aac230d56c16ff7ab3066dd4dbc9da302a36c99aad7cR2842. Implementations of the serial and parallel calculations of thep
value have major differences: https://github.com/QE-Lab/qx-simulator/blob/a131a883d1e324c404e63eec4d4094a2d1bb9045/src/core/gate.h#L3007-L3015 vs. https://github.com/QE-Lab/qx-simulator/blob/a131a883d1e324c404e63eec4d4094a2d1bb9045/src/core/gate.h#L2736-L2743 These differences made it impossible for the code to work correctly with more than 10 qubits. Note that before 1f06044e82ed742104b85e756db579eeafe75afd, the parallel version of thep1_worker
has never been executed, as it was attributed to the branch that should be invoked only for more than 64 qubits.uint64_t x;
was changed from(1 << x)
to(1UL << x)
.Performance improvements:
Performance was measured on Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz, 16 cores (multithreading is off), 93GB RAM: