MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.3k stars 647 forks source link

Clarify and potentially optimize self-RDF #3460

Open scal444 opened 2 years ago

scal444 commented 2 years ago

Is your feature request related to a problem?

For InterRDF, when sel1 and sel2 contain overlapping atoms, the resulting counts (if within the distance cutoff) give hits for all combinations of particles. For example, with 4 particles in each selection, there are 16 hits (4x4). This includes 4 zero distance hits for (0,0), (1,1), and so on. It also doubles the same distance count for (1,2), (2,1).

Describe the solution you'd like

I'm not sure if there's one correct way to address this, but I think it makes more sense not to count self 0 distance hits. I'm less certain on the (2,1), (1,2) double-counting problem. For one data point, Gromacs will only count one hit, so for the example above there would be 6 total entries in counts - (0,1), (0,2), (0,3), (1,2), (1,3), and (2, 3).

If that approach were to be adapted, we could also use the self-search optimization in NSGrid for the common case of the two input selections being identical (in my research, that was pretty much exclusively what I did), which cuts the runtime of the analysis about in half.

Describe alternatives you've considered

There could instead be a self-rdf class if we wanted both behaviors to exist, or a flag to switch between them in InterRDF.

Additional context

richardjgowers commented 2 years ago

I thought there was already self-RDF but apparently not. I think this is probably a good idea, and making it generalisable to allow any width of diagonal band to be excluded sounds useful for e.g. polymers where you don't want to exclude the same chain, but some sort of length.

One sticky point in implementing this is currently numpy doesn't really offer a nice way of implementing a set of pairs of ints (i.e. std::map<std::pair<int, int>>). This would also be useful in our bond algorithms, so having a nice implementation in lib._cutil would be super handy.

orbeckst commented 2 years ago

Could one use a similar approach as used for exclusions to get single counts?