Closed kain88-de closed 6 years ago
@kain88-de I might have approved a bit prematurely but in principle I think this is very good and I trust that you know best what else needs to be done.
The docs do not mention get
anywhere because a while back we switched to scheduler
for the user API.
Yeah we had a crystal ball last year when we started with the word scheduler. I don’t remember ever using get.
On Fri 21. Sep 2018 at 01:41, Oliver Beckstein notifications@github.com wrote:
The docs do not mention get anywhere because a while back we switched to scheduler for the user API.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/pmda/pull/66#issuecomment-423368472, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGnVgjwhbuL3I-QBChwrd6QSlg3tZLXks5udCejgaJpZM4WyG31 .
Dask introduced some more changes that require larger changes within pmda. It should make the code clearer at the end. @VOD555 can you look into this? The relevant documentation is linked below.
The main issue is that dask now uses a global
value to look for the appropriate interpreter. For the tests we now have to remove the session
scope from fixtures to properly set the global values. I haven't experimented how to handle two clusters in the same session.
With these changes, all tests pass locally (with dask 0.20)
(pmda) yngvi:pmda oliver$ pytest -n 4 --disable-warnings --pep8 pmda
======================================================= test session starts ========================================================
platform darwin -- Python 3.6.5, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: ~/MDAnalysis/pmda, inifile: setup.cfg
plugins: xdist-1.22.2, pep8-1.0.6, forked-0.2, cov-2.5.1, hypothesis-3.66.1
gw0 [1000] / gw1 [1000] / gw2 [1000] / gw3 [1000]
scheduling tests via LoadScheduling
.........................................x.................................................................................. [ 12%]
............................................................................................................................ [ 24%]
............................................................................................................................ [ 37%]
............................................................................................................................. [ 49%]
............................................................................................................................ [ 62%]
............................................................................................................................. [ 74%]
............................................................................................................................ [ 87%]
............................................................................................................................ [ 99%]
...... [100%]Future exception was never retrieved
future: <Future finished exception=TimeoutError('Timeout',)>
tornado.util.TimeoutError: Timeout
======================================= 999 passed, 1 xfailed, 24 warnings in 44.79 seconds ========================================
@kain88-de in https://github.com/MDAnalysis/pmda/pull/66#issuecomment-423519229 you commented on a new global state in dask. How would that manifest itself as a problem for the tests?
My understanding of
@pytest.fixture(scope="session", params=(1, 2))
def client(tmpdir_factory, request):
with tmpdir_factory.mktemp("dask_cluster").as_cwd():
lc = distributed.LocalCluster(n_workers=request.param, processes=True)
client = distributed.Client(lc)
yield client
client.close()
lc.close()
is that we set up a single distributed cluster for all tests (actually, two clusters, one with 1 and one with 2 workers) and tests that use it, get scheduled as workers become available.
Can you explain how the session scope is a problem for new dask?
Merging #66 into master will decrease coverage by
3.67%
. The diff coverage is54.34%
.
@@ Coverage Diff @@
## master #66 +/- ##
==========================================
- Coverage 98.09% 94.41% -3.68%
==========================================
Files 8 8
Lines 419 448 +29
Branches 58 61 +3
==========================================
+ Hits 411 423 +12
- Misses 4 18 +14
- Partials 4 7 +3
Impacted Files | Coverage Δ | |
---|---|---|
pmda/leaflet.py | 86.5% <34.78%> (-8.14%) |
:arrow_down: |
pmda/parallel.py | 95.12% <73.91%> (-4.88%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6aa6c61...dd88867. Read the comment docs.
@kain88-de @richardjgowers @VOD555 The tests for upgrade to dask 0.20 pass now; the coverage dropped for reasons that I do not understand.
I had reviewed and approved this PR before it was passing Travis but now that I edited it, I'd appreciate some additional eyes, please.
I get locally
pytest --pep8 -n 4 --cov pmda
all passing
============== 1001 passed, 1 xfailed, 24 warnings in 96.34 seconds ==============
but different coverage changes:
---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name Stmts Miss Branch BrPart Cover
----------------------------------------------------
pmda/__init__.py 5 0 0 0 100%
pmda/contacts.py 53 1 14 1 97%
pmda/custom.py 34 0 6 0 100%
pmda/leaflet.py 112 37 44 3 60%
pmda/parallel.py 108 0 34 0 100%
pmda/rdf.py 53 4 6 1 92%
pmda/rms.py 17 1 0 0 94%
pmda/util.py 37 0 14 0 100%
----------------------------------------------------
TOTAL 419 43 118 5 87%
Using
coverage html
open htmlcov/pmda_leaflet_py.html
shows that
njobs == -1
is not tested_find_connected_components()
code is not run (?!?!?!), which is used in https://github.com/MDAnalysis/pmda/blob/6aa6c616f4387d2b0ef47a704647e5bcf9825aa4/pmda/leaflet.py#L203-L204 where parAtoms
is a dask bag.Perhaps coverage has a hard time to see what's been covered when something is run under dask, under certain circumstances??
I wanted to add tests for leafletfinder with distributed (by using the new parametrized scheduler
fixture) but as described in #76 this opened a whole can of worms so this has to wait.
In my local tests, the things under _single_frame()
in rdf.py
and rms.py
are not covered. That's weird...
I haven't looked at the code changes yet! But I did stop working on this because dask 0.18 has changed the idiomatic style to change the scheduler docs. The new idom is to set the scheduler in a global variable
dask.config.set(scheduler='threads')
or with a context manager
with dask.config.set(scheduler='threads'):
x.compute()
The distributed scheduler overwrites these defaults now on creation
from dask.distributed import Client
client = Client(...) # Connect to distributed cluster and override default
df.x.sum().compute() # This now runs on the distributed system
The correct solution seems to rather be we remove the scheduler
and get
keyword arguments completely. In the tests I guess we can work with something like
@pytest.fixture(params=['multiprocessing', ClientIP])
def scheduler(params):
with dask.config.set(params)
yield
I assume I have the API wrong but the general idea is to start a context manager in the fixture and yield to release it at the end. How well this works I don't know.
From my reading, setting the scheduler on compute()
x.compute(scheduler='threads')
is still supported. I think as long as all our compute()
calls also contain the scheduler, we should be ok, even though
client = Client(...) # Connect to distributed cluster and override default
will set the global defaults.
Or do I misunderstand how this is working now?
The correct solution seems to rather be we remove the scheduler and get keyword arguments completely.
I think you're right that this is the medium term correct solution so that using PMDA conforms to how people use Dask. In the short term (i.e., for this PR at least!) I'd like to move ahead with our current scheduler argument because it is still correct.
Or do you see a problem?
(Alternatively, if someone manages to get the new Dask paradigm working I am also happy... I just only have limited time for this right now.)
The problem is that code suddenly behaves unexpected. Take the following example
client = Client() # yeah lets use dask.distributed.
pdma.contacts.Contacts.run() # This uses multiprocessing!
I would be surprised to see here that the distributed workers don't receive any jobs. Without knowledge of how dask used to work this is also hard to debug.
I would be surprised to see here that the distributed workers don't receive any jobs.
... because PMDA defaults to 'multiprocessing'? Yes, I agree with you.
Thanks for working on it!
@kain88-de are you sure you wanted to push 114a2b0? It looks as if it undoes some of the changes that I pushed and breaks the tests again. If it was intentional and you're working on it then just ignore me ;-).
I don't want to depend on distributed for such simple checks. The code now does a trial import when necessary. We can still check the distributed scheduler in our tests it is not needed for the pmda though/
Fine with me, although I am pretty sure that anyone using dask will also install distributed or not mind having it installed, especially as http://docs.dask.org/en/latest/scheduling.html says
we currently recommend the distributed scheduler on a local machine
i.e., pretty much in all cases.
@kain88-de please check – I'd like to get this merged so that we can move forward and I'd like to get 0.2.0 asap.
I removed my changes again. Lets go with the easier version
I don't know where the reduced coverage comes from right now.
I'll merge it regardless and then we need to dig a bit more into how coverage works with the different schedulers.
Thanks for pushing forward here!!!
I'll merge it regardless and then we need to dig a bit more into how coverage works with the different schedulers.
Thanks for pushing forward here!!!
Stupid GitHub web interface does not work on my slightly outdated mobile. Can you please do a squash merge? Thanks!
This will allow @VOD555 to continue.
@orbeckst I've merged this PR.
Thanks.
The drop in coverage is due to leaflet.py
because we do not currently test all schedulers. This should be addressed as part of #76
Fix #48 and #17
Changes made in this Pull Request:
PR Checklist