Add molecule unwrapping / nopbc utilities

richardjgowers commented 7 years ago

A commonly asked question is how to make molecules whole (ie undo PBC transformations.)

The only thing we currently have is make_whole which is pretty much hidden away anyway.

So things that need doing is:

~~add undo PBC support for orthogonal and triclinic boxes (make_whole only does ortho) as some sort of mini library (mda.lib.pbc_tools ?)~~
support unwrapping without bonds (can use distance as proportion of box length, not always ideal for v large molecules, but works 90% of time)
~~AtomGroup method that unwraps (will center atoms around the COM by default?) AtomGroup.unwrap ?~~
add unwrap kwarg to various methods as standard (similar to how we have pbc as kwarg) (aka #1004)

Have I missed anything?

jbarnoud commented 7 years ago

I do not know if it should be in mda.lib as it deals with atom groups and we tend to avoid dependencies to other pieces of the library in lib. make_whole is already an alien on that respect. I would be file with some mda.pbc module.

I am all in favor of AtomGroup.unwrap. One of its use case would be to deal with odd boxes that gets transformed into triclinic boxes. The unwrap method would be an interesting replacement to gmx trjconv -ur compact im my workflow.

I would like a AtomGroup.make_whole method that would make each fragment whole. It may get tricky with molecules that are only partially in the AtomGroup, though.

It would be awesome to have such library if we get to implement #786.

jbarnoud commented 7 years ago

(just mentioning AtomGroup.wrap for the sake of completeness.)

kain88-de commented 7 years ago

I would be for mda.core.pbc and AtomGroup.pbc.unwrap. That gives PBC the importance it deserves and having a special .pbc in the Atomgroup makes it easy to collect all functions with unwrapping/wrapping and should make them easier to discover for people new to MD simulations.

jbarnoud commented 7 years ago

@kain88-de sounds good to me.

orbeckst commented 7 years ago

I would like to have the following functionality in this (?) module that I have not seen anywhere else: superimpose by RMSD (align) and then cut out a centered and compact unitcell out of the inifititely periodic (rotated) lattice so that, e.g., density calculations always show a perfectly aligned unitcell (maybe changing in size due to pressure coupling).

(Also, if we were able to do the equivalent of trjconv -center -pbc mol -ur compact && trjconv -fit trans+rot in one go then that would be nice...)

richardjgowers commented 7 years ago

@kain88-de are you saying you want the method to be ag.pbc.wrap and ag.pbc.unwrap? What would the pbc object be in that case? I'm not against it, I was just thinking through how we would implement it, and what the pros/cons are

@orbeckst if the transformations applied were just a list, there's no reason why you couldn't chain them, so inside Reader somewhere:

# in Reader
ts = self._read_next_timestep()  # get the raw data from our traj file
for transf in self.transformations:  # and manipulate it
    transf.apply(ts)

# in user code
u = mda.Universe()
u.trajectory.append_transformation(RMSD(stuff))
u.trajectory.append_transformation(Cutout(stuff))

for ts in u.trajectory:
    # magic happens

orbeckst commented 7 years ago

The transformation approach should be workable and looks very neat. Is this already available??

jbarnoud commented 7 years ago

Not yet. Le 27 janv. 2017 8:53 PM, Oliver Beckstein notifications@github.com a écrit :The transformation approach should be workable and looks very neat. Is this already available??

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

kain88-de commented 7 years ago

re you saying you want the method to be ag.pbc.wrap and ag.pbc.unwrap? What would the pbc object be in that case? I'm not against it, I was just thinking through how we would implement it, and what the pros/cons are

My thought was to use this as a namespace rather then an object itself. I would group the pbc operations nicely. There we could also add a ag.pbc.type to guess the box type. We don't need to do that. I just thought it would be nicer for students who start using md simulations and the ag object itself appears less crowded for the user. Then we should also think about adding distance functions as well like ag.pbc.centroid. Thinking about it again I'm not 100% convinced myself if this would be such a good idea.

jbarnoud commented 7 years ago

I agree with @kain88-de idea. The name spaces of groups and universes start to be rather crowded, splitting them in categories would make attributes and methods easier to discover. It would also make the documentation easier to both write and read.

richardjgowers commented 7 years ago

So the tidy namespace is nice, but is there any precedent for having an attribute whose methods affect the original instance? Ie

a = Something()
b = a.thing
b.c()  # operates on a???

I don't want to start doing weird things with the language that aren't normally done

orbeckst commented 7 years ago

I find the namespace a bit weird because of https://github.com/MDAnalysis/mdanalysis/issues/1185#issuecomment-279347827 and because it mostly seems to be the equivalent of setting pbc=True on methods: Are we going to have ag.centroid() and ag.pbc.centroid() where the latter is just ag.centroid(pbc=True)?

Is there precedent elsewhere for this approach?

numpy.ndarray is a prime example of being really full of methods and attributes. I actually don't overly mind that and rather have all the methods associated with the object. The alternative is to use more of a functional approach and have functions operate on AtomGroup (networkx and mdtraj seem to be doing that). Personally, I like the object oriented approach.

jbarnoud commented 7 years ago

When it comes to an object having side effects on its parents, we have the precedent of our own Universe and its Universe.trajectory.

I do not see the problem with having ag.pbc.centroid being a shortcut to ag.centroid(pbc=True). The way I see it, the main point of having the namespace it that a user can start typing ag.pbc. abd discover everything we have about periodic boxes by triggering autocomplete.

kain88-de commented 7 years ago

About the namespace. It's only about documentation and discoverability of the different PBC options. For someone new to MD simulations wrap won't have much meaning, placing it in a namespace called pbc will make the meaning clearer. But yes this isn't a clean approach in python since I can assign pbc = a.pbc where it isn't clear how to propagate changes, this is to much cause for confusion and likely not worth the extra clarity we would get.

About the unwrap as a keyword option to various functions of the atomgroup and transformations in the reader. Suppose I have a AnalysisModule that calculates the mean distance between two atom groups. How could we add the unwrap kwarg to that Analysis without having to change the analysis? Assuming a simple implementation would just to ag1.centroid() - ag2.centroid().

Maybe adding PBC as transformations for a trajectory might work better for iterations and using PBC utilities with existing MDAnalysis code.

ag.universe.trajectory.add_transformation(mda.transformations.pbc.unwrap())
distane(ag1, ag2).run()

But then the user has to call a ag.universe.trajectory.remove() if he wants to iterate over the trajectory as it is saved in the file. How would we specify the transformation to remove uniquely? Would it be better to create a new universe object when we add a transformation to keep the original?

transformed_u = mda.transformations.pbc.unwrap(u, ...)
t_ag1, t_ag2 = ... # repeat selections 
# a shorthand t_ag1 = transformed_u.select_atoms(ag1) would be nice
distance(t_ag1, t_ag2).run()

Of course this isn't good for memory usage with large topologies and adding lots of transformations. But It makes them more composable and PBC related transformations can be in a namespace at the module level. From a pure usage standpoint that would make it easy to use transformations with existing analysis classes and keeping easy to add/remove transformations to a trajectory, important to support easy explorative analysis in a notebook. But for implementation it's a possible nightmare to find/fix all edge-cases.

richardjgowers commented 7 years ago

We can do the traj transformation on the fly as an auxiliary

On Sun, 19 Feb 2017, 1:24 p.m. Max Linke, notifications@github.com wrote:

About the namespace. It's only about documentation and discoverability of the different PBC options. For someone new to MD simulations wrap won't have much meaning, placing it in a namespace called pbc will make the meaning clearer. But yes this isn't a clean approach in python since I can assign pbc = a.pbc where it isn't clear how to propagate changes, this is to much cause for confusion and likely not worth the extra clarity we would get.

About the unwrap as a keyword option to various functions of the atomgroup and transformations in the reader. Suppose I have a AnalysisModule that calculates the mean distance between two atom groups. How could we add the unwrap kwarg to that Analysis without having to change the analysis? Assuming a simple implementation would just to ag1.centroid()

ag2.centroid().

Maybe adding PBC as transformations for a trajectory might work better for iterations and using PBC utilities with existing MDAnalysis code.

ag.universe.trajectory.add_transformation(mda.transformations.pbc.unwrap()) distane(ag1, ag2).run()

But then the user has to call a ag.universe.trajectory.remove() if he wants to iterate over the trajectory as it is saved in the file. How would we specify the transformation to remove uniquely? Would it be better to create a new universe object when we add a transformation to keep the original?

transformed_u = mda.transformations.pbc.unwrap(u, ...) t_ag1, t_ag2 = ... # repeat selections # a shorthand t_ag1 = transformed_u.select_atoms(ag1) would be nice distance(t_ag1, t_ag2).run()

Of course this isn't good for memory usage with large topologies and adding lots of transformations. But It makes them more composable and PBC related transformations can be in a namespace at the module level. From a pure usage standpoint that would make it easy to use transformations with existing analysis classes and keeping easy to add/remove transformations to a trajectory, important to support easy explorative analysis in a notebook. But for implementation it's a possible nightmare to find/fix all edge-cases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/1185#issuecomment-280918921, or mute the thread https://github.com/notifications/unsubscribe-auth/AI0jB5pXQVD3g4xfCi24HSJAtcI-9K3Dks5reEJ5gaJpZM4Lq37x .

jbarnoud commented 7 years ago

@richardjgowers We need to be careful not to apply the transformations several times when the trajectory is on memory, though.

richardjgowers commented 7 years ago

@jbarnoud thinking about it, MemoryReader is a very strange case to the Reader model, as any modifications to positions are permanent (by design). Every other reader you can manipulate the positions then "undo" it by loading the frame again.

orbeckst commented 7 years ago

In principle it would be convenient if we could add transformations to change how a universe reads a traj because then the user can easily move 'virtual' (on the fly generated) trajectories around and use them in the standard analysis framework. The MemoryReader is already a step in this direction. This could open up many new and concise ways to do analysis.

A downside seems to be the additional state that gets added and which will have to be accounted for for parallelization.

richardjgowers commented 6 years ago

So based on #1760, it seems like pbc=True is ambiguous, it just mentions that the calculation will use pbc in some way, but it's not clear to what effect.

Would deprecating pbc and replacing it with both unwrap (make molecules whole, even if they extend beyond periodic boundaries) and wrap (make all atoms within unit cell, ie current behaviour of pbc) solve this?

kain88-de commented 6 years ago

I wouldn't replace pbc=True since it used a lot and would break user code. I would rather remove it from functions where it doesn't have the indented effect which is, give the correct results after any periodic boundary correction (either wrap or unwrap). In the current internal interpretation that means this is only true for function using wrap=True

We can additionally add wrap and unwrap keywords to be specific.

I think we also need to update the center_* functions to allow unwrap to make molecules whole. Simple mapping in the unit cell isn't likely what we want either.

richardjgowers commented 6 years ago

https://github.com/blakeaw/PyBILT/blob/master/pybilt/mda_tools/mda_unwrap.py

richardjgowers commented 5 years ago

Ok so AG.unwrap exists thanks to @zemanj. Next is adding wrap/unwrap kwargs to relevant methods. Should be easy* with the cool test cases added already

*-ha!

MDAnalysis / mdanalysis

Add molecule unwrapping / nopbc utilities #1185