Closed thisac closed 4 years ago
@thisac, @nquesada: It looks like the build is failing because llvmlite
no longer provides wheels for Python 3.5 on the latest version.
You could try pinning llvmlite
+numba
to the previous version that supported Python 3.5, however it probably makes sense to deprecate Python 3.5 support.
Do you want to make another PR that officially removes Python 3.5 support? This would involve updating the setup.py, the readme, installation instructions, and travis/circle/appveyor configs.
I think we should stop supporting python 3.5 as @josh146 suggests.
Merging #161 into master will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #161 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 1002 1080 +78
=========================================
+ Hits 1002 1080 +78
Impacted Files | Coverage Δ | |
---|---|---|
thewalrus/quantum.py | 100.00% <100.00%> (ø) |
|
thewalrus/samples.py | 100.00% <0.00%> (ø) |
|
thewalrus/csamples.py | 100.00% <0.00%> (ø) |
|
thewalrus/symplectic.py | 100.00% <0.00%> (ø) |
|
thewalrus/fock_gradients.py | 100.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update cf7a19e...8a519ad. Read the comment docs.
Hey @thisac : Could you add some screen shots of the improvements you found when tweaking OMP_NUM_THREADS here? Also maybe it is worth mentioning that in the docstring?
For example explain to the user when it is worth allowing parallel=True
and how to do it by exporting environment variables. Other than that I think this is ready to be merged.
Below are the results from some simple benchmarks comparing how the use of parallelisation can speed things up (or slow things down).
OpenMP uses parallelisation, which can be turned of by setting the environment variable OMP_NUM_THREADS=1
, meaning that only a single thread will be used. By default this is set to use all threads (8 in this case).
Dask can use either the "threads"
(uses multiple threads in the same process) or the "processes"
(sends data to separate processes) scheduler. "threads"
is bound by the GIL and is thus best to use with non-python objects while "processes
" works best with pure python code (with a slight overhead). See https://docs.dask.org/en/latest/setup/single-machine.html for more information.
As seen below, the fastest run seems to be when just using the Dask parallelisation in probabilities
while using the "processes"
scheduler and while turning off OpenMP parallelisation.
from thewalrus.quantum import probabilities as p
import numpy as np
n = 4
mu = np.random.random(2*n)
cov = np.random.random((2*n, 2*n))
cov += cov.conj().T
print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)
print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
No parallel excecution with Dask
419 ms ± 74.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With parallel excecution with Dask
2.72 s ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%set_env OMP_NUM_THREADS=1
print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)
print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
env: OMP_NUM_THREADS=1
No parallel excecution with Dask
632 ms ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With parallel excecution with Dask
748 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%set_env OMP_NUM_THREADS=1
print("\nNo parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=False)
print("\nWith parallel excecution with Dask")
%timeit p(mu, cov, cutoff=4, parallel=True)
env: OMP_NUM_THREADS=1
No parallel excecution with Dask
605 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With parallel excecution with Dask
302 ms ± 6.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That's great @thisac! It might also be interesting to see how this scales as the number of modes/cutoff increases.
I've moved the environment variable changes to the tests (so no changes are being made in the probabilities function itself), utilising monkeypatch.setenv
(thanks @josh146 :1st_place_medal:).
I've also checked locally that OMP_NUM_THREADS=1
whenever parallel=True
during the testing. The docstring is also slightly altered.
One can set up the number of OMP threads by doing os.environ["OMP_NUM_THREADS"] = "1"
Context: The
probabilities
function could be easily parallelised, potentially providing nice speed-ups.Description of the Change: Adds a parallelisation option to the
probabilities
function so that the probabilities can be calculated simultaneously. Tests are also adapted to include running theprobabilities
function withparallel=True
andparallel=False
.Benefits: The probability-calculations can be parallelised and thus possibly done much quicker.
Possible Drawbacks: Since the OpenMP is already utilising parallelisation in the background, the implementations in this PR does not necessarily provide any speedups, but rather gives the option to parallelise differently.
Related GitHub Issues: N\A