Closed jbarnoud closed 4 years ago
https://github.com/richardjgowers/MDAnalysis-coarsegraining
I've already hacked together something like this for coarse-graining. The difference to other transforms mentioned is this changes the number of particles in the system as you're condensing many positions into one.
Adding a pbc/undo pbc would be a cool transform. I think some subroutines along these lines exist (lib.mdamath.make_whole
and lib.distances.apply_PBC
), but it would be good to make these more accessible.
@jbarnoud I see we're analysing the same system lately! :P
@richardjgowers I've seen your CGUniverse class. I forgot to link it in the proposal. I am still not sure if I want this proposal to be able to change the topology, perhaps in as a second step.
When I added the virtual particle I mention above, I really would have like to visualize them with nglview.
Adding a pbc/undo pbc would be a cool transform. I think some subroutines along these lines exist (
lib.mdamath.make_whole
andlib.distances.apply_PBC
), but it would be good to make these more accessible.
The tranforms exist already. But they have to be called on a per-frame basis. They will not be called by nglview for instance.
I see we're analysing the same system lately! :P
I hope you have fun with the system. I sure do :D
I'd love to have fit+pbc and remap in such a way that the unit cell is not rotating around the fitted group. I don't know of any tool that can do this even though it is clear that it's possible: you just have to carve the same Wigner-Seitz unitcell out of the inifinitely repeated (and rotated) system.
@orbeckst Since the transformations get to alter the TimeStep
, and since the unit cell is stored there, it should be possible.
So I like this, I'd just change it so that the object which accepts transformations is the Reader, so all your u.
calls become u.trajectory.
.
We could store these just as a list in the Reader, then inside the Reader we'd have code like..
def _read_next_timestep(self):
ts = self._read_next()
for transform in self.transformations:
transform(self)
return ts
So each transform
could be a partial function, or maybe a class that defines __call__
. The class is useful for the case of making molecules whole, which needs bond information.. So you could do something like
from MDAnalysis.transformations import MakeWhole
u.trajectory.transforms.append(MakeWhole(u.bonds))
@richardjgowers This is exactly what I had in mind.
I will try to come up with a working prototype in the next few days (with one or two transformations to play with). But I will be rather busy so do not hold your breath.
_read_next_timestep
seems to be the best place to implement the transformations. Yet, it is an abstract method of ProtoReader
that get overload by each reader. I am thinking of wrapping the method so the readers overload a method that get called by _read_next_timestep
. Maybe I am missing an obvious solution to that?
@jbarnoud so essentially adding another layer between next
and _read_next_timestep
?
I might have a play with where's best for this... part of me thinks that you could use a decorator like
@with_transforms
def next(self):
return self._read_next_timestep()
so essentially adding another layer between next and _read_next_timestep?
Basically yes. But the decorator may be overkill. I was thinking about something like:
def _read_next_timestep(self):
ts = self._read_next() # Overloaded by the reader
for transformation in self._transformations:
ts = transformation(ts)
return ts
Yes @jbarnoud solution looks good. Though I would put it directly into the next
function. That way you won't have to change any reader code.
@kain88-de Except that _read_next_timestep
is used in various other places. It is used directly in __iter__
, and in _read_frame
at least.
The change in the readers would be small, just a matter of changing the name of a method where it is defined. Not even where it is called.
So with how #868 has worked out, I don't really see why this can't be implemented using the aux mechanisms. Maybe we subclass aux's to get them to be called transforms everywhere, but I think the mechanics look sound.
@richardjgowers I have a clear idea on how it can get implemented alongside aux, and a rough idea on how it can interact with aux. Could you elaborate on how it could get implemented uning aux?
def _read_frame_with_aux(self, frame):
"""Move to *frame*, updating ts with trajectory and auxiliary data."""
ts = self._read_frame(frame)
for aux in self.aux_list:
ts = self._auxs[aux].read_ts(ts)
return ts
With the snippet above, there's no reason with aux
couldn't modify the positions in ts
. A lot of the nomenclature is a little confusing (read_ts
should be modify_ts
) but all the hooks are in the right place.
@davidercruz this is all relevant to you
Hey there. After talking with @jbarnoud, he proposed that I look at the reader codes and make a scheme of what happens when we iterate over a trajectory. Here's what I found:
When the user loads a trajectory through the MDAnalysis Universe class eg. u = mda.Universe(PSF, DCD)
load_new()
method of the Universe
class is called;load_new()
the get_reader_for()
method of the _get_readers
module, which guesses the format and returns the appropriate reader (DCDReader
for the example above);load_new()
call, an instance of the "Chosen"Reader
class is created. The 1st frame is read and converted to a Timestep
. Finally native units are converted to MDAnalysis base, and a Timestep
instance is returned by the reader;trajectory
property from the Universe
class becomes a reference to the trajectory reader object, and the user can access the methods from the ProtoReader
and all the subclasses.When the user requests the another frame of the trajectory eg. u.trajectory.next()
next()
from the ProtoReader
creates an instance of Timestep
, ts
, by calling the _read_next_timestep()
method, which is overloaded by the _read_next_timestep()
of the reader class for that format;_read_next_timestep()
of the reader class, the next frame is read from the file and converted to Timestep
as before. Units are also converted as before.next()
returns ts
@richardjgowers, I saw #1215 . From what I understand, the transformations would be applied every time a frame is loaded from the file. But what happens when the trajectory is loaded into the memory? Is the frame actually modified permanently? @jbarnoud was telling me that the MemoryReader
would be problematic due to this, as we would need to keep track of whether the frame was already modified before (eg. after reaching the EOF and rewinding). Otherwise the auxiliaries look like a good way to apply on-the-fly transformations.
A solution would be to apply the transformations when the trajectory is transferred to the memory. What do you think?
@davidercruz yep memoryreader will be a headache. I agree doing it all up front is the best solution, so each time a transformation is added, the (in-memory) trajectory is iterated through and the transformation applied. (You could maybe come up with some solution which lazily applies a transformation the first time a frame was requested, but that just sounds like asking for trouble.)
Your schematic of what happens when iterating looks solid, there's also __getitem__
and sliced iteration but they work in pretty much the same way. So really here is ideally the only place you're "allowed" to edit Readers to make your transformations happen on any frame access.
You could redefine MemoryReader._read_frame_with_aux()
to avoid the double transformation problem, but it's a little ugly and introduces an extra burden as you then have to maintain two implementations of the Reader API. An interesting idea would be for MemoryReader.add_transformation()
(which would override BaseReader.add_transformation()
) to apply then neuter any transformations a user adds, so that when they're later called in BaseReader._read_frame_with_aux()
they're inert.
@richardjgowers yes, that place avoids changes to the format readers. My only issue is that it's not what the function was initially designed to due. But code evolves :D
What about changing things at the reader level? For example, the _read_next_timestep
method of the format reader classes seems to be always called (except when the trajectory is first read). So before returning ts
the transformations could be applied. The advantage is that I think it would give us more freedom with our implementation, and could make it easier to read the code. The disadvantage is the same lines of code on multiple modules.
I'll try it with simple function later.
This proposal is now implemented.
On-the-fly coordinate transformations
So far, the only transformation MDAnalysis does to coordinates read from a trajectory is a unit conversion. Any other transformation must be done by th user on a frame per frame basis. Yet, in some use case, the user does not directly access the frame, and therefore cannot apply the transformations. This is especially the case with analyses and visualizations. Hereby, I propose a general mechanism to declare coordinate transformations that will be applied by the reader.
Use cases
The main use cases for the proposal are analyses and visualizations that require structure alignements or periodic boundary corrections.
Analyses such as RMSD calculations require the structures to be aligned. So far, the RMSD class control the fit with a keyword, and only implement a single way of doing that fit. This means that if the user wants to do a different fit, he must save an intermediate trajectory. It also means an other analysis that require a fit needs to implement the way to handle it; such implementation being redundant and potentially inconsistent with the other analyses.
@dotsdl posted recently a blog post on nglview. This library allows to visualize a trajectory from MDAnalysis in a jupyter notebook. Like most (if not all) visualization software, nglview does not fix the periodic boundary conditions, which can lead to ugly artifact with bond crossing the box. MDAnalysis could fix the periodic box in a way transparent for the visualization library.
Also, some transformations are painful to do with the tools embedded in the simulation packages, and can require multiple call to tools like
gmx trjconv
with several intermediate trajectory. With the proposed mechanism it would be possible to declare a workflow of transformations in MDAnalysis and apply them frame by frame, without intermediate files.New APIs
User facing API
Only few changes are needed in the user facing API: a minima a method should be added to
Universe
to register a transformation, few methods could also be added to to inspect and modify the transformation workflow.A transformation is implemented as a callback that takes a
TimeStep
as argument and modify in on place. Using a callback means that a transformation can be implemented in the simple way possible for that transformation. The simplest transformations can be implemented as a function, so as transformations that require a constant argument:The more complicated transformations can be implemented as classes with a
__call__
method.Ideally, transformations that require access to the topology (e.g. making molecule whole) can access it via the
universe
attribute of theTimeStep
for a minimum burden of the user.It would be nice to be able to add several transformation in one go as a workflow:
Internal API
Internally, the
Universe
class must allow to register transformations and must pass them to thecoordinate.base.ProtoReader
class that apply them, in order, when reading a new frame.This proposal share problems with proposal #785 by @richardjgowers. Both proposal need a way to declare method, and to execute them when reading a new frame. Auxiliary data should be read before executing the coordinate transformations so the transformation can use the auxiliary data.
Optionally, it could be useful to allow a coordinate transformation to add auxiliary (e.g. rotation matrix used for the fitting). This should be trivial if auxiliary data are attached the
TimeStep
.Collection of transformations
For this proposal to be really useful, MDAnalysis must have a collection of transformations ready to use. The more obvious ones are the already implemented structure fitting, and PBC correction methods.
Limitations
Some exiting transformation require to change the topology. It is the case to any transformation that adds virtual particle, or any transformation that coarse grain or back map a trajectory.
For exemple, I recently wrote an script that adds virtual particle to each frame of a trajectory to test the possible bias of an analysis. Also, several of us work with coarse-grained models and need to analyse atomistic simulation as if they were coarse-grained.
These transformations require to adapt the topology of the system, and I do not see a clean way of doing it. Also, it is not clear if such transformations are in the scope of this proposal (even though I would love to be able to use them).