MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.29k stars 647 forks source link

align.alignto default behavior can be counterintuitive when mobile and reference come from the same Universe #1825

Open mnmelo opened 6 years ago

mnmelo commented 6 years ago
# With 0.17.0
u = mda.Universe("lipids.gro")
# In my case, two selections of a CG lipid
sel1 = u.atoms.residues[0].atoms
sel2 = u.atoms.residues[1].atoms

np.linalg.norm((sel2.positions - sel1.positions)[0])
Out: 11.033123

MDAnalysis.analysis.align.alignto(sel2, sel1)
Out: (5.387410817321917, 4.8535472523788785)

np.linalg.norm((sel2.positions - sel1.positions)[0])
Out: 11.033121 # ???

The align.alignto can take either Universes or directly AtomGroups as its mobile and reference structures. Atoms to actually fit are selected by optional selection strings, set with the select keyword and defaulting to 'all'. After alignment, by default, the entire Universe of mobile will be rotated/shifted.

The default behavior is fine for the usual case where you have, say, a Universe with a protein as mobile and another with a protein as reference. In my case above I had a single Universe with two lipids and wanted to superimpose both. I passed one lipid's AtomGroup as mobile and the other as reference and ended up with a rotated Universe but no superposition since both groups come from the same Universe that was rotated.

Upon understanding the puzzling result, I figured I could use the subselection keyword to specify 'all' instead of the default None. This will force a calculation of mobile.select_atoms("all") that will be rotated, instead of the default mobile.universe:

align.alignto(sel2, sel1, subselection='all')
Out: (5.3874108864475705, 4.853547267248602)

np.linalg.norm((sel2.positions - sel1.positions)[0])
Out: 3.670187 # That's more like it

This is clearly couterintuitive, and stems from the Universe + selection_string vs. AtomGroup duality we have in the analysis API.

I think in this case we could be clever and either: 1- be conservative and simply emit a warning when both mobile and reference come from the same Universe and no subselection is passed; 2- be slightly API-breaking and default to subselection='all' when both mobile and reference come from the same Universe.

Even though it's API-breaking I vote for option 2, since I can see no use case for the default behavior when both groups come from the same Universe.

orbeckst commented 6 years ago

I think I can be convinced to go with option 2 but it would be good to have a broader understanding of all the combinations of options that we allow.

Also, we must have discussed this previously, but does a good reason remain not to switch over to passing AtomGroups around and abandon passing selection strings?

mnmelo commented 6 years ago

Right now some pretty extravagant things are allowed, such as fitting the c-alphas of a protein in universe1 to those of the protein in universe2 but the perform the rotation on some entirely separate group, such as the solvent of universe3. Option 2 would retain these possibilities.

On the switching to only AtomGroups: I have been little involved with this part of the code. My guess as to why Universes are still accepted is historical reasons? In any case, there's little specific code to handle Universes vs. AtomGroups. It just works because a Universe can mostly duck-type as an AtomGroup for this purpose. IMO it'd be unpythonic to block that. Switching to AtomGroups would then be mostly a matter of updating the docs.

That said, if we assume the user is passing AtomGroups we can also assume that by default they'll want to rotate said AtomGroups, and make subselection default to 'all' for any case.

kain88-de commented 6 years ago

https://github.com/bio-phys/pydiffusion/blob/12f50f23a74a21fd7abbbb2df8a1b5ba3472c6a9/pydiffusion/util/mda.py#L48

This function deals elegantly with this corner case by creating a copy of the reference to fit against. I never had the time to include it into MDAnalysis. But the code has a good amount of tests written as well. It could allow for @mnmelo use case without breaking the API