Closed Strilanc closed 7 years ago
That's weird. Did you do export OMP_NUM_THREADS=#cores
?
Depending on the compiler (icc vs gcc, also version-dependent), threads are not kept alive and that causes a large slowdown.
This is a short speed test on my notebook (battery power). It actually doesn't matter for my compiler too much if OMP_NUM_THREADS=4
or OMP_NUM_THREADS=8
Damians-MacBook-Pro:code Damian$ export OMP_NUM_THREADS=4
Damians-MacBook-Pro:code Damian$ python2.7 speed_test.py
13742 gates/sec @ 4 qubits
15134 gates/sec @ 5 qubits
15053 gates/sec @ 6 qubits
14618 gates/sec @ 7 qubits
14910 gates/sec @ 8 qubits
15114 gates/sec @ 9 qubits
15018 gates/sec @ 10 qubits
14857 gates/sec @ 11 qubits
14525 gates/sec @ 12 qubits
14234 gates/sec @ 13 qubits
13361 gates/sec @ 14 qubits
11900 gates/sec @ 15 qubits
10137 gates/sec @ 16 qubits
7969 gates/sec @ 17 qubits
5181 gates/sec @ 18 qubits
3055 gates/sec @ 19 qubits
export OMP_NUM_THREADS=4
makes a huge difference:
21677 gates/sec @ 4 qubits
21287 gates/sec @ 5 qubits
22056 gates/sec @ 6 qubits
19170 gates/sec @ 7 qubits
13135 gates/sec @ 8 qubits
18900 gates/sec @ 9 qubits
21962 gates/sec @ 10 qubits
21605 gates/sec @ 11 qubits
20964 gates/sec @ 12 qubits
20316 gates/sec @ 13 qubits
18732 gates/sec @ 14 qubits
16289 gates/sec @ 15 qubits
13201 gates/sec @ 16 qubits
6460 gates/sec @ 17 qubits
4014 gates/sec @ 18 qubits
2927 gates/sec @ 19 qubits
Given the huge difference in performance, is there a reason this isn't the default?
The OpenMP default is the number of available hardware threads; I don't know why this is the case.
If it's possible to detect this kind of misconfiguration and fixing it, we might want to consider doing that. But the export workaround does solve my particular issue with testing performance.
You may also want to use export OMP_PROC_BIND=SPREAD
to increase the simulator performance even more:
Damians-MacBook-Pro:code Damian$ export OMP_NUM_THREADS=4
Damians-MacBook-Pro:code Damian$ export OMP_PROC_BIND=SPREAD
Damians-MacBook-Pro:code Damian$ python2.7 speed_test.py
15714 gates/sec @ 4 qubits
15804 gates/sec @ 5 qubits
15121 gates/sec @ 6 qubits
15463 gates/sec @ 7 qubits
14696 gates/sec @ 8 qubits
14942 gates/sec @ 9 qubits
15318 gates/sec @ 10 qubits
14667 gates/sec @ 11 qubits
14639 gates/sec @ 12 qubits
13932 gates/sec @ 13 qubits
12428 gates/sec @ 14 qubits
11225 gates/sec @ 15 qubits
10220 gates/sec @ 16 qubits
8007 gates/sec @ 17 qubits
5537 gates/sec @ 18 qubits
3215 gates/sec @ 19 qubits
By the way, I think it should be qubits.extend(...)
rather than append; not that it makes a difference :)
I'm having trouble measuring ProjectQ's performance. Something is causing a serious slowdown.
For example, I ran these commands from my terminal:
Here are the contents of
speed_test.py
:And got these results:
Those rates are terrible. I get higher performance with the python simulator up to 14 qubits:
Last month when I speed-tested projectq, it was getting numbers similar to Quirk: 8000 gates/sec at 16 qubits. I'm not sure what would have changed in the meantime, but performance seems to have dropped by 50x.
I have confirmed in my own debugging that the line
self._simulator.apply_controlled_gate
seems to be the big offender, but I haven't figured out much more than that.