Closed iparask closed 5 years ago
@iparask sklearn is not getting installed. https://travis-ci.org/MDAnalysis/pmda/jobs/435384534#L1101
For MDAnalysis, sklearn is optional. Is BallTree required for your code or can you make it optional?
The most efficient one uses BallTrees. It also reduces the memory footprint a lot. Let me include KDTrees instead, which are part of Scipy. Otherwise we will have to go with cdist or something similar. @orbeckst, I missed the travis output. Sorry about that
Is BallTree a lot better?
MDAnalysis also recently got capped_distance which could perhaps be used instead. We now use this for neighborsearching. Importantly, it takes periodicity into account.
@richardjgowers can you comment on capped_distance()
vs BallTree()
– did this come up during @ayushsuhane 's benchmarking during GSOC?
@orbeckst a BallTree is just a KDTree (I think...). grid benched faster than KDTree nearly all of the time, so if we can use that it would be better
@orbeckst @richardjgowers I have a wiki page at least with some information for KD-Trees and BallTrees. Also, I remember @drelu had created a plot comparing different trees, I will put it here as soon as I find it.
@iparask we had a student (@ayushsuhane) putting in a lot of work in the summer to efficiently calculate distances. Have a look at his blog. He implemented several algorithms in MDAnalysis to efficiently calculate distances with a cutoff. He has found that the cell-list algorithm yielded the best results. It's available in MDAnalysis as lib.distances.capped_distance
in the develop branch. Please have look at these functions. All functions included in MDAnalysis also take care of periodicity for triclinic boxes.
Currently only as simple PEP8 check fails.
You can run the PEP8 checks locally, too (together with coverage):
pytest --pep8 -v --cov pmda pmda
(needs pytest-pep8 and pytest-cov).
@iparask there's still an import of BallTree somewhere:
==================================== ERRORS ====================================
__________________ ERROR collecting pmda/test/test_leaflet.py __________________
ImportError while importing test module '/home/travis/build/MDAnalysis/pmda/pmda/test/test_leaflet.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
pmda/test/test_leaflet.py:10: in <module>
from pmda import leaflet
pmda/leaflet.py:28: in <module>
from sklearn.neighbors import BallTree
E ModuleNotFoundError: No module named 'sklearn'
Once the tests pass and you need a final review, please ping me. Thanks!
Merging #69 into master will decrease coverage by
1.34%
. The diff coverage is94.64%
.
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
- Coverage 99.27% 97.92% -1.35%
==========================================
Files 7 8 +1
Lines 274 386 +112
Branches 26 48 +22
==========================================
+ Hits 272 378 +106
- Misses 1 4 +3
- Partials 1 4 +3
Impacted Files | Coverage Δ | |
---|---|---|
pmda/leaflet.py | 94.64% <94.64%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3d78fc7...ce5d471. Read the comment docs.
@orbeckst, this is a request for a final review. I have added all parts as asked. I believe it is as close as I can see it getting. Travis fails with warnings mainly because I have overridden and changed the methods of the base class in terms of input parameters.
We'll have to fix the pylint warnings or exclude them; if we merge it like this, all future CI will fail.
Run
pylint pmda
locally while testing. I get:
Using config file /Volumes/Data/oliver/Biop/Projects/Methods/MDAnalysis/pmda/.pylintrc
************* Module pmda.leaflet
pmda/leaflet.py:73: [W0231(super-init-not-called), LeafletFinder.__init__] __init__ method from base class 'ParallelAnalysisBase' is not called
pmda/leaflet.py:115: [E1102(not-callable), LeafletFinder._find_connected_components] cKDTree is not callable
pmda/leaflet.py:159: [W0221(arguments-differ), LeafletFinder._single_frame] Parameters differ from overridden '_single_frame' method
pmda/leaflet.py:192: [W1619(old-division), LeafletFinder._single_frame] division w/o __future__ statement
pmda/leaflet.py:229: [W0221(arguments-differ), LeafletFinder.run] Parameters differ from overridden 'run' method
-----------------------------------
Your code has been rated at 9.77/10
Most of these warnings look fixable to me
//
_single_frame()
could and just ignore them.Disable the other pylint warning in code by bracketing the code with # pylint: disable=...
/# pylint: enable=...
blocks; see for example https://github.com/MDAnalysis/pmda/blob/3d78fc7c90e19cdc72932412b05c4c5d2d567fe7/pmda/rdf.py#L90
On a second look, the mismatching call signatures might be a bit more difficult.... start with disabling the warnings in the code. I'll then comment in a code review.
(One way or another, the linter has to pass.)
Looking good – is there a way to increase test coverage? It will decrease our 99% to 96%...
Actually, just checked the diff – there's not much you can do for coverage.
Can you help me read the output of the codecov bot you included? I think that if we can test using the distributed scheduler the coverage will increase.
~@ianmkenney~ @iparask in PR #66 we will automatically run all possible parallelizations so no need to do this manually here.
Just add yourself as an author and we're done here.
EDIT: sorry, autocompleted the wrong username
Thank you @iparask , congratulations to your first contribution!
Fixes #47
Changes made in this Pull Request:
This PR can be used as a place holder. I would like to initiate a discussion as to as the written code is based on the requirements. I would like to report two major differences from other cases. This code uses
Dask Bags
to execute instead ofdelayed
. In addition, parallelization is done on a frame basis. This was the most efficient implementation we had in the paper (code linked in the respective issue).Please review/ comment as necessary and code can grow.
Giannis
PR Checklist