MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.26k stars 641 forks source link

Cache creation of compound masks #4612

Open PicoCentauri opened 3 weeks ago

PicoCentauri commented 3 weeks ago

Changes made in this Pull Request:

Add a cache for creating the compound masks in a group. This speeds up the calculations of accumulate other methods that use _split_by_compound_indices:

u = mda.Universe(TPR, XTC)

Running a simple timing test with caching gives on my machine 397 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit 
u.atoms.accumulate("masses", compound="residues")

and with "disabled" cache 1.16 ms ± 11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit 
u.atoms.accumulate("masses", compound="residues")
u.atoms._cache.pop("residues_masks")

PR Checklist

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4612.org.readthedocs.build/en/4612/

pep8speaks commented 3 weeks ago

Hello @PicoCentauri! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-06-05 17:36:21 UTC
github-actions[bot] commented 3 weeks ago

Linter Bot Results:

Hi @PicoCentauri! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code. Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/9388722067/job/25854613962


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

PicoCentauri commented 3 weeks ago

This is related to #3169 but does different caching as proposed in #3169.

orbeckst commented 2 weeks ago

@richardjgowers could you please have a look at this PR, looks like your wheelhouse.

If you don't have the bandwidth please un-assign yourself and let me know. Thanks!

richardjgowers commented 6 days ago

this failure here:

Mismatched elements: 2 / 9 (22.2%)
  Max absolute difference: 55.42303448
  Max relative difference: 26.25439738
   x: array([[21.286, 41.664, 40.465],
         [44.528, 43.426, 23.248],
         [57.534, 27.871, 53.767]])
   y: array([[21.286, 41.664, 40.465],
         [44.528, 43.426, 78.671],
         [ 2.111, 27.871, 53.767]], dtype=float32)

the x coordinate is off by 55.423 where the boxlength on that dimension is 55.423, so that just looks like a rounding error. Those two x coordinates are virtually identical, and this test doesn't seem to be trying to establish which image gets picked, it's just bad luck that this regression test could fall into either. (arguably there could be some sort of periodic assert coordinates equal function for things like this...)

The moment of inertia error is a little more tricky to pin down in a similar way, but I'd be suspicious that it is also a rounding error

I got these tests to fail locally, and then they started passing locally, so it looks a bit twitchy...

PicoCentauri commented 5 days ago

Thanks for looking into this @richardjgowers. I can also try to debug why the .center_of_mass() doens't seem to create the cache entry.