MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

Parallel RMSF #97

Closed nawtrey closed 5 years ago

nawtrey commented 5 years ago

Fixes #90

Changes made in this Pull Request:

PR Checklist

nawtrey commented 5 years ago

Current issues:

nawtrey commented 5 years ago

Currently it does not pass the tests because desk_helper() gives _reduce() indices of 0 to n-1. So the weighting of the means is off. Oddly enough, adding “n = n - 1” to the “else:” portion of _reduce() produces similar results to MDAnalysis.

nawtrey commented 5 years ago

Currently produces stable results on 1 core, and the difference between this function and MDAnalysis RMSF results are generally on the order of 1e-6.

orbeckst commented 5 years ago

That's encouraging. Is 1e-6 the worst case?

What is the error for 1, 2, 4, 8 frames in the trajectory, r.run(stop=1) etc?

nawtrey commented 5 years ago

Here are the worst cases of 10 atoms over n frames:

orbeckst commented 5 years ago

This is still on 1 block so in principle it should give the same answer as this is the same algorithm in serial, just written in a different way. Can you find out if this is rounding or a real difference somewhere?

Does it change if you change your sum arrays to, say float32 (they should be float64) or float16? I.e. is there an obvious effect of the precision?

Normally I would say 1e-6 is fine but it's on the border of machine precision for float32 (which is the precision of the coordinates) and 1e-5 is within float32 so checking carefully now is a good investment of time. Maybe there's a subtle error or you find a way to make the calculation numerically more stable.

codecov[bot] commented 5 years ago

Codecov Report

Merging #97 into master will decrease coverage by 0.95%. The diff coverage is 84.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #97      +/-   ##
==========================================
- Coverage   97.58%   96.63%   -0.96%     
==========================================
  Files           9       10       +1     
  Lines         580      624      +44     
  Branches       74       76       +2     
==========================================
+ Hits          566      603      +37     
- Misses          8       14       +6     
- Partials        6        7       +1
Impacted Files Coverage Δ
pmda/parallel.py 98.48% <100%> (ø) :arrow_up:
pmda/util.py 88.37% <16.66%> (-11.63%) :arrow_down:
pmda/rmsf.py 94.73% <94.73%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 943ed2b...3ddd004. Read the comment docs.

codecov[bot] commented 5 years ago

Codecov Report

Merging #97 into master will increase coverage by 0.21%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #97      +/-   ##
=========================================
+ Coverage   97.58%   97.8%   +0.21%     
=========================================
  Files           9      10       +1     
  Lines         580     637      +57     
  Branches       74      78       +4     
=========================================
+ Hits          566     623      +57     
  Misses          8       8              
  Partials        6       6
Impacted Files Coverage Δ
pmda/parallel.py 98.48% <100%> (ø) :arrow_up:
pmda/rmsf.py 100% <100%> (ø)
pmda/util.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 34d465b...c9cbb8f. Read the comment docs.

orbeckst commented 5 years ago

Did using float64 help?

nawtrey commented 5 years ago

Float64 fixed the accuracy issue entirely. The current version can utilize any number of jobs, but there is an issue with the means calculated on multiple blocks. If it is ran with n_jobs = 2, each block spits out a mean position for each atom. But when these values are averaged they do not give the same mean position that MDAnalysis gives for the same trajectory. I have thoroughly tested _conclude(), and it appears that the issue lies in the calculation of the mean positions inside of _reduce() (for n_jobs > 1). It is close, but the rmsf errors can be on the order of 1e-1. Further investigation will be required.

orbeckst commented 5 years ago

but the rmsf errors can be on the order of 1e-1.

That's a bug then. Happy bug-hunting!

nawtrey commented 5 years ago

Currently test_reduce() in test_util.py has the 'assert_almost_equal' decimal set to 5, so it is checking if they are almost equal to 5 decimal places. The test passes consistently with this setting, but it does not pass if decimal = 7 for 1e5 frames. It appears as though the accuracy decreases with increasing frame number, though it is still quite good. The relative error for the sum of squares is still on the order of 1e-14, which is likely not going to be an issue. Running the calculations with np.float64 and np.float128 did not seem to make any difference in the relative and absolute error values.

orbeckst commented 5 years ago

Don't use float128; for a start, it's only float96 anyway and it's non-standard.

Decimals = 5 is still ok-ish. Add a comment to the tests explaining why the lower threshold.

nawtrey commented 5 years ago

Currently test_rmsf.py is failing on Travis but it runs all tests successfully on my local machine, but otherwise everything is there. Only thing left is to update the CHANGELOG.

orbeckst commented 5 years ago

Travis has to pass. From a quick look it's not just a simple "off by 7th decimal". Have a look at the Travis output (click on the "Details" link), maybe it becomes clearer why the failures.

orbeckst commented 5 years ago

Btw, your log messages contain good bullet points but please do not just use "Change:" as the subject line – instead summarize what the changes are. When I look at the commit listing (as on this page or with git log) I typically only see the subject line. That line should already convey the essential information what the commit is about.

nawtrey commented 5 years ago

As far as I can tell, test_rmsf.py is failing because it is running differently on Travis. When it runs through the different number of blocks and jobs, it is giving the same results as if the number of jobs is not changing. For example, on Travis the n_cores=2 case is giving the same result that my local machine gives if I run pmda.rmsf.RMSF(u.atoms).run(n_blocks=2, n_jobs=1), but that's not what it is being instructed to run. It should be running n_blocks=n_jobs=2. It does the same thing for the n_cores=3 and n_cores=4 cases, where it gives the result as if it was running with n_jobs=1.

orbeckst commented 5 years ago

Is the test run under distributed or multiprocessing? distributed does its own thing and ignores n_jobs.

Are there other tests where this might show up?

orbeckst commented 5 years ago

Just in time for the end of the REU! Well done!