MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

add kdtree option for spherical selection keywords #1706

Open kain88-de opened 6 years ago

kain88-de commented 6 years ago

See https://github.com/MDAnalysis/mdanalysis/pull/1692/commits/8bad42896a16e4ac8662b4886c8b02edc075afeb

The other selections can already choose to use the kdtree option that works also with periodic boundary conditions. See #1692

navyakhare commented 6 years ago

kdtree option seems to be in place for both Spherical Layer Selection and Spherical Zone Selection,

        sel = self.sel.apply(group)
        [box](url) = self.validate_dimensions(group.dimensions)
        ref = sel.center_of_geometry(pbc=self.periodic)
        if box is None:
            kdtree = KDTree(dim=3, bucket_size=10)
        else:
            kdtree = PeriodicKDTree(box, bucket_size=10)
        kdtree.set_coords(group.positions)
        kdtree.search(ref, self.exRadius)
        found_ExtIndices = kdtree.get_indices()
        kdtree.search(ref, self.inRadius)
        found_IntIndices = [kdtree.get_indices()]([url](url))
        found_indices = list(set(found_ExtIndices) - set(found_IntIndices))
        return group[found_indices].unique
        sel = self.sel.apply(group)
        box = self.validate_dimensions(group.dimensions)
        ref = sel.center_of_geometry(pbc=self.periodic)
        if box is None:
            kdtree = KDTree(dim=3, bucket_size=10)
        else:
            kdtree = PeriodicKDTree(box, bucket_size=10)
        kdtree.set_coords(group.positions)
        kdtree.search(ref, self.cutoff)
        found_indices = kdtree.get_indices()
        return group[found_indices].unique

respectively.

what seems to be left is applying kdtree option for Cylindrical Selection and I would like to work on that issue.

kain88-de commented 6 years ago

@navyakhare sure you can work on this. Do you have a working development environment of MDAnalysis? It will help to run tests locally.

Are there specific things you would like to know about this issue or did you already research where in the code you have to make changes?

@jmborr do you have any recommendations for the cylindrical selections and ideas for good tests?

jmborr commented 6 years ago

In the case of periodic boundary conditions, shouldn't the group positions and center of geometry be first translated to the unit cell before this subtraction? Maybe I'm misunderstanding CylindricalSelection.apply, I have to check.

For a spherical selection one can replace the whole sphere with a point (the sphere center) and then check its distance to each atom in the group, with the group atoms encapsulated in a kdtree. However, I can't imagine how to replace the whole cylinder with a finite set of points in order to repeat the trick done in the spherical selection.

navyakhare commented 6 years ago

@kain88-de I have a working development environment of MDAnalysis. I think I know where the changes need to me made in the code but I'll need guidance for testing.

@jmborr what about replacing the cylinder with an axis, the central axis of the cylinder, and then checking distance from this axis to each atom in the group. This can be done by defining the axis in terms of points on the axis and then adding a method in https://github.com/MDAnalysis/mdanalysis/blob/2703b38b5fa7f90857d1dd5a3d2a6020861913b2/package/MDAnalysis/lib/pkdtree.py#L146 to calculate distances from an axis and not just points. Though I am not really sure about the effect of sending multiple points on other functions.

jmborr commented 6 years ago

@kain88-de the distance of a query point to the cylinder axis is given by the line that is normal to the axis and passing through the the query point and the axis, right? If there happen to be no point at the intersection between this normal and the axis, you will not be able to calculate its distance, but rely on a nearby point along the axis. That will give you an estimate of the distance but not the distance itself. Of course, the estimate will get better the more points you lie down along the axis.

I think one way could be to first find out the subset of query points with Z coordinate within Zmin and Zmax of cylinder, and then use a 2D-kdtree with only X and Y coordinates to check the distances of the points in the subset to the cylinder axis. If you restrict yourself to the XY plane, then we can substitute the cylinder axis with just one point.

kain88-de commented 6 years ago

@kain88-de the distance of a query point to the cylinder axis is given by the line that is normal to the axis and passing through the query point and the axis, right?

Yes

The approximation of the cylinder axes by points will not work reliably. It depends on the number of points along the axis and requires a lot of computation power because we will get the same atoms multiple times along an axis.

I think one way could be to first find out the subset of query points with Z coordinate within Zmin and Zmax of the cylinder and then use a 2D-kdtree with only X and Y coordinates to check the distances of the points in the subset to the cylinder axis. If you restrict yourself to the XY plane, then we can substitute the cylinder axis with just one point.

Yes, this is a good solution. In fact, we are currently using the projection into the XY-plane when using cylinder selections without periodic boundary condition.

        # First deal with Z dimension criteria
        mask = (vecs[:, 2] > self.zmin) & (vecs[:, 2] < self.zmax)
        # Mask out based on height to reduce number of radii comparisons
        vecs = vecs[mask]
        group = group[mask]

Such a solution, of course, means that we do not apply the PBC condition on the Z-axis for cylindrical selections. This should then be added to the documentation.

@MDAnalysis/coredevs Anything against not treating PBC on the Z-axis for cylindrical selections?

jmborr commented 6 years ago

In my opinion I would support PBC up the monoclinic system and only when the Z-axis is the axis perperdicular to the other two. Any membrane system would be monoclinic, for instance.

In the general triclinic system, the axes of the lattice do not have to be aligned with any of the X-, Y-, or Z- axis, so one can't split the treatment of PBC between the Z-axis and the XY-plane :disappointed:

kain88-de commented 6 years ago

@jmborr MD simulations are only monoclinic I think. At least the ones supported by GROMACS. But yeah the spherical selection should make a guess if the z-axis restriction can be meaningfully applied.

@navyakhare if you are not familiar with periodic boundary condition and unit-cells. A good resource is this PhD Thesis and the GROMACS manual chapter 3.2. Please also ask if you do not understand our discussion.

navyakhare commented 6 years ago

@kain88-de I referred the resources. Since MD simulations are only monoclinic ( which I don't understand completely as in the reference it's written to be triclinic in general?) The z vector of the cell can be defined to be aligned to the cylindrical axis which will now make it act like an infinite cylinder and hence even without periodic boundary condition in z axis this will correspond to an isolated infinite cylinder. If neighbouring cylinders do not overlap (which I think is the case) shouldn't this work ?

kain88-de commented 6 years ago

Here is a sketch of the PBC problem in 2D https://drive.google.com/file/d/1Lyw0ScF0w_tF8kznPIo_Q5Q02OX75MSF/view?usp=sharing

Triclinic C is the problem case that @jmborr mentioned.

For a cubic cell, everything is fine. Also when an axis of the box is aligned with the z-axis. If this isn't the case we have a projection error.

The periodic distance algorithm in pkdtree is checking if an atom is close to the box boundary. If this is detected in generates appropriate copies of the atoms. This helps to keep the algorithm usable for large systems because we minimize the copies that are needed.

Our original idea for the spherical selection was to cut out atoms in the selected Z-range and project all points into XY plane. For the triclinic cell, this scheme will only if a box axis is aligned with the Z-axis. In all other cases, the Z-axis encodes information about the distance to the boundary. This is hopefully illustrated by the sketch.

Another possible idea here is that we generate copies for all atoms close to the boundary before we do the XY projection. To keep the memory requirements minimal this would define close as all atoms in a distance r_cutoff. This cutoff could be the cylinder radius.

https://drive.google.com/file/d/14nrLQlz62L-RQpSm9mgPab4lVOAY3-j_/view?usp=sharing

mnmelo commented 6 years ago

Just came across this thread. This is probably a naïve suggestion, but what about repacking coordinates in the compact octahedron or dodecahedron representation, centered on the selection point, and then go from there? Is the cost of repacking larger than the extra-image steps one needs to take?

ayushsuhane commented 6 years ago

Hi, This looks like a really interesting problem. if no-one is currently attempting this problem, I would like to have a shot at it. I saw the code for spherical zone selection and spherical layer selection, cylindrical selection seems particularly interesting. Although, I would need some help in understanding the small pieces independently, but I think the idea of XY intersection can be extended for cylinders as well. Based on the angles of the unit cell(triclinic or monoclinic), we shall be able to know the orientation of cylinder with respect to the box. Additionally, every intersection of a plane with a cylinder forms an ellipse. Given the center point of the axis, it should be feasible to calculate the distance of point from the axis and whether it is inside or outside the defined region of cylinder. Does it seem like a feasible solution?

jmborr commented 6 years ago

@ayushsuhane You're very welcome to try, then we can take a look to the code once you have a first write up.

ayushsuhane commented 6 years ago

@jmborr @kain88-de I am not sure whether I understand the problem this issue is addressing. From the title it seems like adding KDtree options only for spherical selections which are already in place, as mentioned above as well. The commit in the issue header is points to skipped tests related to spherical selections, which are also covered for the periodic case. Just to get a clarify, is it only the cylindrical zone selection which remains for this issue.

class CylindricalZoneSelection(CylindricalSelection):
    token = 'cyzone'
    precedence = 1

    def __init__(self, parser, tokens):
        super(CylindricalZoneSelection, self).__init__()
        self.exRadius = float(tokens.popleft())
        self.zmax = float(tokens.popleft())
        self.zmin = float(tokens.popleft())
        self.sel = parser.parse_expression(self.precedence)

class CylindricalLayerSelection(CylindricalSelection):
    token = 'cylayer'
    precedence = 1

    def __init__(self, parser, tokens):
        super(CylindricalLayerSelection, self).__init__()
        self.inRadius = float(tokens.popleft())
        self.exRadius = float(tokens.popleft())
        self.zmax = float(tokens.popleft())
        self.zmin = float(tokens.popleft())
        self.sel = parser.parse_expression(self.precedence)

However, there is no distmat or KDtree in place for these classes.

Given that this is the problem, I referred to the references you discussed above. While I am aware about the periodic boundary conditions, I am not quite sure if I understand the triclinic case with PBC. Will it be possible for you to describe a 3D unit cell and the projections in respective planes.

I saw the Gromacs reference, where it says that effectively the simuations are done in a cubic box but the trajectory can be converted into other unit reference cells. Though I donot get the proposed projection on one of the plane, can you provide what would be a good input in terms of geometry for the selection.

kain88-de commented 6 years ago

@jmborr when did we fix the spherical selections not being tested?

@ayushsuhane I'm busy right now and can't do a lot of 3d drawings. Maybe one of the others can give you some hints about the possible solution that we discussed.

jmborr commented 6 years ago

@kain88-de PR #1733 added tests for the spherical layer and zones in test_atomselections.py

ayushsuhane commented 6 years ago

Hi, I gave this a little thought. Following @jmborr 's idea about filtering the z-dimension which can be followed by projecting into the XY plane and considering the axis as a point. Here is the code snippet for implementation:

vecs = group.positions - sel.center_of_geometry()
if self.periodic:
    if np.all(group.dimensions[3:] == 90.):
                # Orthogonal version
        vecs -= box[:3] * np.rint(vecs / box[:3])
        else:
                # Triclinic version
         tribox = group.universe.trajectory.ts.triclinic_dimensions
         vecs -= tribox[2] * np.rint(vecs[:, 2] / tribox[2][2])[:, None]
         vecs -= tribox[1] * np.rint(vecs[:, 1] / tribox[1][1])[:, None]
         vecs -= tribox[0] * np.rint(vecs[:, 0] / tribox[0][0])[:, None]
mask = (vecs[:, 2] >= self.zmin) & (vecs[:, 2] <= self.zmax)
vecs = vecs[mask]
group = group[mask]

kdtree = PeriodicKDTree(box=box)
cog = sel.center_of_geometry()
z_coord = cog[2]
coords_xy = np.c_[group.positions[:, :2], z_coord*np.ones(len(group.positions))]
cog_xy = np.append(cog[:2], z_coord)

cutoff = self.exRadius if box is not None else None
kdtree.set_coords(coords_xy, cutoff=cutoff)
indices = kdtree.search(cog_xy, self.exRadius)
try:
    In_indices = kdtree.search(cog_xy, self.inRadius)
    indices = np.setdiff1d(indices, In_indices)
except AttributeError:
    pass
vecs = group.positions[indices]
group = group[indices]

This has following problems: