MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

Handling AtomGroup arguments in analysis functions/classes #1074

Open kain88-de opened 7 years ago

kain88-de commented 7 years ago

See PR #1073 Issues: #888, #719 Connected #1029

@orbeckst asked if a AtomGroup is a valid argument to have for an analysis. I would say yes and I write most of my own analysis in terms of aomgroups. But there is also old code and some newer that works with a universe and selection strings.

Some example classes Accepting Universe: AlignTraj, Contacts (That one is a mixed bag with the reference though). Accepting Atomgroup: InterRDF

I'm personally in favor of giving using an atomgroup as a parameter. I'm actually writing all my own analysis classes like this. This also makes it easy to use atomgroups of the same system from different simulations in another universe. It also feels like more natural way to use a python library like this. The universe and selection string version is more along using a CLI tool.

I've actually written a small argument parsing function in #888 that would be able to allow both approaches. See #888 but that is connected to #1029 with the new system.

The question is how should we handle atomgroup arguments in the analysis classes. Do we want to allow them or should it be a strict universe/selection-string option?

orbeckst commented 7 years ago

Hbond analysis needs selection strings so that the selection can be recomputed for each frame.

I agree that using AtomGroups instead of selection strings feels more pythonic.

In principle the Universe can always be obtained from an AtomGroup so there is not really any need to provide a Universe except in order to make clear that an explicit selection is needed.

Not sure yet what the cleanest approach is or if it is even sensible to try a one size fits all approach.

kain88-de commented 7 years ago

one size fits all approach

How about a one size fits 90% approach? That's what my argument parsing in #888 is trying to do. For the other 10% like Hbond analysis we can still use explicit selection strings as they are the only viable option and throw a sensible error message explaining that a string is needed for this analysis in contrast to the common case.

This goes well along with having a consistent API for analysis classes #719.

richardjgowers commented 7 years ago

We also had some discussion in #175 about this, mostly about creating an automatically updating AtomSelection. I think the tl;dr of that idea is:

# needs some kwarg/different method to define that we don't want a regular AtomGroup
atom_sel = u.select_atoms('around 5.0 protein and resname SOL', dynamic_selection=True)

# find how many water atoms within 5.0 of protein 
for ts in u.trajectory:
    print len(atom_sel)

It's maybe a little complicated, and could just be replaced by calling select_atoms manually with a selection string.

We should probably just accept either strings or AtomGroup interchangeably.

mnmelo commented 7 years ago

Hey guys, I'm still working on this dynamic selection idea (it got held back because it depended heavily on the new topology system).

I should have it out soon.

richardjgowers commented 7 years ago

So with UpdatingAtomGroups now a thing, is there anything a string argument can do that an AtomGroup (possibly updating) can't do? If not, is it a good idea to try and push towards going fully OO and recommend using AG as arguments rather than strings?