MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

update to dask 0.18.0 #66

Closed kain88-de closed 6 years ago

kain88-de commented 6 years ago

Fix #48 and #17

Changes made in this Pull Request:

PR Checklist

orbeckst commented 6 years ago

@kain88-de I might have approved a bit prematurely but in principle I think this is very good and I trust that you know best what else needs to be done.

orbeckst commented 6 years ago

The docs do not mention get anywhere because a while back we switched to scheduler for the user API.

kain88-de commented 6 years ago

Yeah we had a crystal ball last year when we started with the word scheduler. I don’t remember ever using get.

On Fri 21. Sep 2018 at 01:41, Oliver Beckstein notifications@github.com wrote:

The docs do not mention get anywhere because a while back we switched to scheduler for the user API.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/pmda/pull/66#issuecomment-423368472, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGnVgjwhbuL3I-QBChwrd6QSlg3tZLXks5udCejgaJpZM4WyG31 .

kain88-de commented 6 years ago

Dask introduced some more changes that require larger changes within pmda. It should make the code clearer at the end. @VOD555 can you look into this? The relevant documentation is linked below.

https://dask.pydata.org/en/latest/configuration.html

kain88-de commented 6 years ago

The main issue is that dask now uses a global value to look for the appropriate interpreter. For the tests we now have to remove the session scope from fixtures to properly set the global values. I haven't experimented how to handle two clusters in the same session.

orbeckst commented 6 years ago

With these changes, all tests pass locally (with dask 0.20)

(pmda) yngvi:pmda oliver$ pytest -n 4 --disable-warnings --pep8 pmda
======================================================= test session starts ========================================================
platform darwin -- Python 3.6.5, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: ~/MDAnalysis/pmda, inifile: setup.cfg
plugins: xdist-1.22.2, pep8-1.0.6, forked-0.2, cov-2.5.1, hypothesis-3.66.1
gw0 [1000] / gw1 [1000] / gw2 [1000] / gw3 [1000]
scheduling tests via LoadScheduling
.........................................x.................................................................................. [ 12%]
............................................................................................................................ [ 24%]
............................................................................................................................ [ 37%]
............................................................................................................................. [ 49%]
............................................................................................................................ [ 62%]
............................................................................................................................. [ 74%]
............................................................................................................................ [ 87%]
............................................................................................................................ [ 99%]
......                                                                                                                       [100%]Future exception was never retrieved
future: <Future finished exception=TimeoutError('Timeout',)>
tornado.util.TimeoutError: Timeout

======================================= 999 passed, 1 xfailed, 24 warnings in 44.79 seconds ========================================
orbeckst commented 6 years ago

@kain88-de in https://github.com/MDAnalysis/pmda/pull/66#issuecomment-423519229 you commented on a new global state in dask. How would that manifest itself as a problem for the tests?

My understanding of

@pytest.fixture(scope="session", params=(1, 2))
def client(tmpdir_factory, request):
    with tmpdir_factory.mktemp("dask_cluster").as_cwd():
        lc = distributed.LocalCluster(n_workers=request.param, processes=True)
        client = distributed.Client(lc)

        yield client

        client.close()
        lc.close()

is that we set up a single distributed cluster for all tests (actually, two clusters, one with 1 and one with 2 workers) and tests that use it, get scheduled as workers become available.

Can you explain how the session scope is a problem for new dask?

codecov[bot] commented 6 years ago

Codecov Report

Merging #66 into master will decrease coverage by 3.67%. The diff coverage is 54.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #66      +/-   ##
==========================================
- Coverage   98.09%   94.41%   -3.68%     
==========================================
  Files           8        8              
  Lines         419      448      +29     
  Branches       58       61       +3     
==========================================
+ Hits          411      423      +12     
- Misses          4       18      +14     
- Partials        4        7       +3
Impacted Files Coverage Δ
pmda/leaflet.py 86.5% <34.78%> (-8.14%) :arrow_down:
pmda/parallel.py 95.12% <73.91%> (-4.88%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6aa6c61...dd88867. Read the comment docs.

orbeckst commented 6 years ago

@kain88-de @richardjgowers @VOD555 The tests for upgrade to dask 0.20 pass now; the coverage dropped for reasons that I do not understand.

I had reviewed and approved this PR before it was passing Travis but now that I edited it, I'd appreciate some additional eyes, please.

orbeckst commented 6 years ago

I get locally

 pytest --pep8 -n 4 --cov pmda

all passing

============== 1001 passed, 1 xfailed, 24 warnings in 96.34 seconds ==============

but different coverage changes:

---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name               Stmts   Miss Branch BrPart  Cover
----------------------------------------------------
pmda/__init__.py       5      0      0      0   100%
pmda/contacts.py      53      1     14      1    97%
pmda/custom.py        34      0      6      0   100%
pmda/leaflet.py      112     37     44      3    60%
pmda/parallel.py     108      0     34      0   100%
pmda/rdf.py           53      4      6      1    92%
pmda/rms.py           17      1      0      0    94%
pmda/util.py          37      0     14      0   100%
----------------------------------------------------
TOTAL                419     43    118      5    87%

Using

coverage html
open htmlcov/pmda_leaflet_py.html

shows that

Perhaps coverage has a hard time to see what's been covered when something is run under dask, under certain circumstances??

orbeckst commented 6 years ago

I wanted to add tests for leafletfinder with distributed (by using the new parametrized scheduler fixture) but as described in #76 this opened a whole can of worms so this has to wait.

VOD555 commented 6 years ago

In my local tests, the things under _single_frame() in rdf.py and rms.py are not covered. That's weird...

kain88-de commented 6 years ago

I haven't looked at the code changes yet! But I did stop working on this because dask 0.18 has changed the idiomatic style to change the scheduler docs. The new idom is to set the scheduler in a global variable

dask.config.set(scheduler='threads')

or with a context manager

with dask.config.set(scheduler='threads'):
    x.compute()

The distributed scheduler overwrites these defaults now on creation

from dask.distributed import Client
client = Client(...)  # Connect to distributed cluster and override default
df.x.sum().compute()  # This now runs on the distributed system

The correct solution seems to rather be we remove the scheduler and get keyword arguments completely. In the tests I guess we can work with something like

@pytest.fixture(params=['multiprocessing', ClientIP])
def scheduler(params):
    with dask.config.set(params)
        yield

I assume I have the API wrong but the general idea is to start a context manager in the fixture and yield to release it at the end. How well this works I don't know.

orbeckst commented 6 years ago

From my reading, setting the scheduler on compute()

x.compute(scheduler='threads')

is still supported. I think as long as all our compute() calls also contain the scheduler, we should be ok, even though

client = Client(...)  # Connect to distributed cluster and override default

will set the global defaults.

Or do I misunderstand how this is working now?

The correct solution seems to rather be we remove the scheduler and get keyword arguments completely.

I think you're right that this is the medium term correct solution so that using PMDA conforms to how people use Dask. In the short term (i.e., for this PR at least!) I'd like to move ahead with our current scheduler argument because it is still correct.

Or do you see a problem?

orbeckst commented 6 years ago

(Alternatively, if someone manages to get the new Dask paradigm working I am also happy... I just only have limited time for this right now.)

kain88-de commented 6 years ago

The problem is that code suddenly behaves unexpected. Take the following example

client = Client()  # yeah lets use dask.distributed.
pdma.contacts.Contacts.run()  # This uses multiprocessing! 

I would be surprised to see here that the distributed workers don't receive any jobs. Without knowledge of how dask used to work this is also hard to debug.

orbeckst commented 6 years ago

I would be surprised to see here that the distributed workers don't receive any jobs.

... because PMDA defaults to 'multiprocessing'? Yes, I agree with you.

Thanks for working on it!

orbeckst commented 6 years ago

@kain88-de are you sure you wanted to push 114a2b0? It looks as if it undoes some of the changes that I pushed and breaks the tests again. If it was intentional and you're working on it then just ignore me ;-).

kain88-de commented 6 years ago

I don't want to depend on distributed for such simple checks. The code now does a trial import when necessary. We can still check the distributed scheduler in our tests it is not needed for the pmda though/

orbeckst commented 6 years ago

Fine with me, although I am pretty sure that anyone using dask will also install distributed or not mind having it installed, especially as http://docs.dask.org/en/latest/scheduling.html says

we currently recommend the distributed scheduler on a local machine

i.e., pretty much in all cases.

orbeckst commented 6 years ago

@kain88-de please check – I'd like to get this merged so that we can move forward and I'd like to get 0.2.0 asap.

kain88-de commented 6 years ago

I removed my changes again. Lets go with the easier version

kain88-de commented 6 years ago

I don't know where the reduced coverage comes from right now.

orbeckst commented 6 years ago

I'll merge it regardless and then we need to dig a bit more into how coverage works with the different schedulers.

Thanks for pushing forward here!!!

orbeckst commented 6 years ago

I'll merge it regardless and then we need to dig a bit more into how coverage works with the different schedulers.

Thanks for pushing forward here!!!

orbeckst commented 6 years ago

Stupid GitHub web interface does not work on my slightly outdated mobile. Can you please do a squash merge? Thanks!

This will allow @VOD555 to continue.

VOD555 commented 6 years ago

@orbeckst I've merged this PR.

orbeckst commented 6 years ago

Thanks.

The drop in coverage is due to leaflet.py because we do not currently test all schedulers. This should be addressed as part of #76