MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.29k stars 646 forks source link

MDAnalysis.lib.distances needs rework #2046

Open zemanj opened 6 years ago

zemanj commented 6 years ago

The documentation of the module MDAnalysis.lib.distances has several issues, whereof one has already been addressed by @xiki-tempula in issue #2004. I also see some points where code duplication could be reduced to make the code more DRY.

Documentation issues:

Code issues:

TODO suggestions:

That's quite a lot of things to do, but I've already started working on most of the points. There are still issues to be discussed, especially the module's title ~and description and how to proceed with the requirements for boxes~.

Current version of MDAnalysis:

~0.18.1-dev~ 0.19.1-dev

richardjgowers commented 5 years ago

@zemanj so with the openmp loops, I thought it was preferable to have as few fork/joins as possible, so ideally one for this double loop. Many it would be a good idea to ensure that the numref loop was larger than numconf loop. Would be interesting to see if this changes performance (ie 50 vs 1000 and 1000 vs 50)

WRT false sharing, we can just set the schedule to be the width of however many floats we can fetch at once? I would hope that the default scheduler would be doing that.

richardjgowers commented 5 years ago

I added tidying the namespace, we currently dump everything there, which isn't useful to users (doing tab completion)

zemanj commented 5 years ago

@richardjgowers Yes, we need something like lib.pbctools or so. We should also deprecate the direct access to lib functions in the analysis module.

With "make C functions invisible" you mean adding a couple of dels at the end of c_distances.pyx and c_distances_openmp.pyx?

Regarding parallel performance, I got that already under control (dynamically reacting on array lengths etc.), just haven't had the time to wrap it up yet. Contrary to my first intuition, false sharing is either not a big deal or it is rather hard to tackle, for example when the threads in self_diatance_array() dump their results in the output array (IIRC). Keep in mind that false sharing is only relevant in write operations, and it might sometimes compete with memory locality when reading data. I also played around with OpenMP schedules to no avail. The default schedule seems to be pretty well suited.

tanmaymunjal commented 2 years ago

Hey, Just trying to take a hit at the lib.distances.distance_array() issue. This seems to be the relevant codebase where the issue is referring:

https://imgur.com/7sgIP8B

Could anybody guide me to which loop the issue is colluding or if the issue is solved and should be closed in the latest version?

hmacdope commented 2 years ago

Hi @tanmaymunjalz the loop in question is in the C header package/MDAnalysis/lib/include/calc_distances.h and is at line 383 406 and 431. I would make links but I'm on my phone.

tanmaymunjal commented 2 years ago

Thanks! Will take a look