KDTree.NeighborSearch.list_search with periodic boundary conditions

GoogleCodeExporter commented 9 years ago

The KDTree-based distance search does not take periodic boundary conditions 
into account. Christopher Ing suggested [1] to use Patrick Varilly's "periodic 
KD-tree" code" available at https://github.com/patvarilly/periodic_kdtree .

I am filing this report so that his good idea is not forgotten. If anyone wants 
to try it I will make them owner of this issue report. Questions to be answered:

- Can the PeriodicCKDTree code be used as a drop-in replacement for 
KDTRee._CKDTree?

- What is the difference in performance? Should we keep two separate code 
paths, one with, the other without PBC?

- Does it work with arbitrary (triclinic) boxes? (See also Issue 136).

[1] 
https://groups.google.com/d/msg/mdnalysis-discussion/XHG8qf1FMz8/XxzDQ5zwmVgJ

Original issue reported on code.google.com by orbeckst on 1 May 2013 at 12:31

GoogleCodeExporter commented 9 years ago

I'll have a go at this.  The github link looks good, but the benchmarks for 
periodic compared to regular seem slower than they should be, so I might look 
at extending the current solution instead.

With the above github link, it says it's licensed under a scipy license.  Am I 
allowed to copy that solution into MDA and reference it?

Original comment by richardjgowers on 3 Feb 2014 at 9:24

GoogleCodeExporter commented 9 years ago

The scipy license http://www.scipy.org/scipylib/license.html is a BSD-3-clause, 
which is compatible with GPL2 
http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses . We just 
have to properly attribute an include the licence file.

Original comment by orbeckst on 3 Feb 2014 at 9:38

GoogleCodeExporter commented 9 years ago

Original comment by orbeckst on 4 Feb 2014 at 12:15

GoogleCodeExporter commented 9 years ago

I've got an experimental branch up for this, feature-periodicKDTree

It's a fairly crude solution, if a box is also given to the function, it copies 
the coordinates 1/2 box length in all directions, so you make a KDTree for a 
box 8x the size.  (Looking any further than 1/2 box length is pointless as you 
will have a nearer image in the opposite direction).  I keep track of which 
coordinates I copied where for later...

You do the regular KD search on the augmented list of coordinates, then when it 
get the indices back, if any are higher than the number of coordinates (because 
they come from a "false" coordinate"), it translates these back to their proper 
index.

Because I'm just making an oversized KDTree, the scaling isn't amazing, it 
takes me about 12x as long to build a tree with full periodic boundaries.  From 
timing I did myself, distance_array is quicker unless you start doing lots of 
searches.  I was using a system of about 10k particles, maybe it will be better 
for larger systems.

To improve on the performance, I added a width keyword.  This lets you choose 
how much periodic space to consider, in each dimension. (width = np.array([+x, 
-x, +y, -y, +z, -z]) )

Eg if you were doing something with a bilayer you could choose to only extend 
the KDTree into the Z coordinate, meaning that you would only then have ~2x the 
amount of coordinates rather than 8x.

The width keyword works according to distance too, so you can specify that you 
want 5 Angstroms in the +Z and -Z direction.  Again, I think this would be good 
for larger systems, and when you know the length of your search radius when 
setting up the Tree.

If someone could do some performance testing with an obnoxiously large system, 
I'd be interested to see if this is useful!

I did also find this solution which uses the same base KDTree as MDA.

http://prody.csb.pitt.edu/_modules/prody/kdtree/kdtree.html#KDTree

This method seems to not duplicate the coordinates, but instead run each search 
27 times (moving reference coords each time), and pick the minimum result.  I'd 
think this would be much faster to build, but then possibly slower to use.

Original comment by richardjgowers on 8 Feb 2014 at 2:44

GoogleCodeExporter commented 9 years ago

The test trajectory MDAnalysis.tests.datafiles.GRO and 
MDAnalysis.tests.datafiles.XTC contain 47681 atoms. Try those for a start.

An alternative to KDTree would be to just do a grid search using minimum image. 
There's actually some code in MDAnalysis.analysis.gnm.generate_grid() that 
demonstrates the idea of grid searching, which could be used as a starting 
point. Grid search seems conceptually simpler than KDTrees.

Original comment by orbeckst on 10 Feb 2014 at 4:48

djay0529 / mdanalysis

KDTree.NeighborSearch.list_search with periodic boundary conditions #137