glotzerlab / freud

Powerful, efficient particle trajectory analysis in scientific Python.
https://freud.readthedocs.io
BSD 3-Clause "New" or "Revised" License
270 stars 49 forks source link

3D Voronoi Diagram : convex hull instead of cubic box ? #839

Closed Optimox closed 3 years ago

Optimox commented 3 years ago

Hi,

I'm wondering if it's feasible to bound the 3D space by a dilatation of the convex hull of a set of points instead of a cube?

Indeed, for my problem I have points at the top and bottom surface which are not aligned on two planes. This creates quite big voronoi cells at the top and/or bottom (see an example in 2D bellow)

image

Also I noticed on the example above that cells at the top have a high number of sides, does that mean that they are considered to be neighbors to a lof of other top cells, even if not shown in the plot?

Thanks a lot for your help and congratulations on the repository!

bdice commented 3 years ago

Hi @Optimox! This is an interesting question. The freud library uses a library called voro++ for its Voronoi computations. I recommend looking at the voro++ examples and/or source code to see if the kind of computation you want is supported. I see there are ways to constrain a particle’s Voronoi cell to fit within a predefined polyhedral volume using “initial walls.” See the example of dealing with irregular particle packings. http://math.lbl.gov/voro++/examples/

Pretty much any feature of voro++ could be used in the freud interface / implementation with some work, or perhaps you can call voro++ from C++ directly if that better suits your needs. I’m happy to provide guidance if you want help implementing an existing interface to voro++ in freud.

Other ideas: you might want a geometry library that can perform cuts on the Voronoi cells’ polyhedra after the fact. That partly depends on whether you need the polyhedra or neighbor information as well. You can use freud’s NeighborList.filter method if you can define which bonds should be cut.

Let us know what you think! Hope these ideas are helpful for your analysis.

bdice commented 3 years ago

To the second part of your question:

Also I noticed on the example above that cells at the top have a high number of sides, does that mean that they are considered to be neighbors to a lof of other top cells, even if not shown in the plot?

Every side of a polytope (polygon/polyhedron) in the Voronoi diagram defines a neighbor bond in freud. Because the box is treated as periodic, there are neighbor bonds that will cross the periodic boundary. You can see an example of how to draw the neighbor bonds here at the bottom: https://freud.readthedocs.io/en/latest/gettingstarted/examples/module_intros/locality.Voronoi.html

Optimox commented 3 years ago

hello @bdice,

Thanks for your quick and detailed answer.

Maybe I can explain a bit more in details what I'm trying to do and see if there's an elegant solution.

I have points detected automatically within a 3D image, those points represent the center of nuclei (medical images). I know each point has an entire cell surrounding it. I also know that a cell cannot be larger than a certain size and that there are cells in only a small fraction of my initial 3D image.

Currently I tried to use freud.box to avoid having very large voronoi cells on top and at the bottom layers but the fact that top cells can be at different height makes the cube not very appropriate and if I compute the volumes of each cell some will be huge because they are far from the box frontier. So I thought I could replace the cube with the convex hull of all my detected points, which I would dilate a bit (the size of typical cell) so that the cells on the border will have a decent space for their voronoi cell (but not too big). I just had a look at voro++ and I think walls is exactly what I need in order to create something like this : http://math.lbl.gov/voro++/examples/frustum/

However I don't know anything in C++ so I think I'll have a hard time using it. If you already know where/how it could be implemented inside freud maybe I can try working on it with some help on where to start.

About my second question, I would like to use the Voronoi neighbours to create a graph connecting adjacent cells in 3D. However I'd like to make a difference between "real" neighbor bonds and "periodic" bonds as they do not reflect a physical connection. But I think I can detect the "periodic" bonds by computing distances between the two nodes so it should be ok right?

bdice commented 3 years ago

@Optimox Interesting! I think I understand your goals better now.

If you’re dealing with real data, freud’s handling of periodic boxes is probably not what you want. Periodic boxes are common in simulations, our usual focus. SciPy has a non-periodic Voronoi implementation from QHull that could be helpful as well: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Voronoi.html

Is the key thing you want out of this the Voronoi bonds? Do the polyhedra matter at all, e.g. for determining cell volume?

If only the bonds matter, this idea might work without needing to alter freud or voro++.

  1. Find all Voronoi bonds (just use freud’s Voronoi as-is).
  2. Find neighbors within a cutoff radius (freud calls this a ball query). See the docs for info about neighbor querying.
  3. Take the set intersection of the Voronoi neighbor pairs and the neighbor pairs within a cutoff distance. (There’s no existing code for this function but it should be relatively straightforward; ask if you need help with this.)

The result should contain the neighbor pairs that share a side in the Voronoi diagram AND are closer than the cutoff distance (to eliminate bonds between separate clusters or across boundaries).

If you need local cell volumes, perhaps you can approximate it by computing the Voronoi cell volumes and then take min(volume, max_cell_volume) where you plug in a reasonable maximum value that will be used for the particles on the edge of a cluster (which have large Voronoi cells extending into the empty space).

Does that help?

Optimox commented 3 years ago

Thank you very much, yes I think I can go with this solution.

Maybe I'll ask you one last question before thanking you very much for your precious help.

I'd like to define bottom and top cells from my voronoi graph. Is there a way to detect that a cell has neighbor across a specific plane of the cube (top plane and bottom plane)? So that I would consider a cell to be on top of the voronoi graph if it does not have any real physical neighbor on top of it?

bdice commented 3 years ago

@Optimox I’m not sure if I understand. You can try something like this to see if the bond crosses a periodic boundary:

# Given a NeighborList called nlist, a Box object called box, and a NumPy array of particle positions, of shape (N, 3), called points

unwrapped_bonds = points[nlist.query_point_indices] + box.wrap(points[nlist.point_indices] - points[nlist.query_point_indices])
box.contains(unwrapped_bonds)

The result should be a boolean array telling you if each bond crosses a periodic boundary. This boolean array can be passed to nlist.filter to drop bonds that cross the boundary.

Optimox commented 3 years ago

Thanks very much @bdice.

Just to be sure I understand:

Optimox commented 3 years ago

I think I don't fully understand what box.wrap does.

Here is a minimal code that shows something I don't get:

import numpy as np
import freud

Lx = 5
Ly = 10
N_points = 100

random_points = np.zeros((N_points, 3))

random_points[:, 0] = np.random.uniform(-Lx/2, Lx/2, N_points)
random_points[:, 1] = np.random.uniform(-Ly/2, Ly/2, N_points)
box = freud.box.Box(Lx=Lx, Ly=Ly, is2D=True)
wrapped_points = box.wrap(random_points)

print("This should be true but I get : ", (random_points==wrapped_points).all())

Since all the points I created are already inside the main box, how can it be that wrapping them changes their positions?

EDIT

Ok I think it's just a matter of precision if I actually try this I get the correct answer:

print("This should be true but I get : ", (np.allclose(random_points, wrapped_points, atol=1e-05)))
bdice commented 3 years ago

@Optimox Correct, this is an issue of precision. The freud library uses single precision (float32) for all calculations. NumPy defaults to float64.

bdice commented 3 years ago
* `points` here are already wrapped point inside the box?

Yes, the array of particle positions (which we call points) is supposed to be contained in the box. See the tutorial on Periodic Boundary Conditions for some more information on this topic.

* you start from a query point, follow the link to a neighbor and then look if this point is indeed inside the box (not outside)?

Yes, exactly. It's checking cases like the following. The two red points are neighbors under periodic boundaries, and their "ghost" images are shown in blue. The periodic image points cross periodic boundaries, and you could use that criterion to reject specific bonds. You might also try setting box.periodic = False to disable finding neighbors across periodic boundaries but I am unsure if that behavior is fully implemented for all classes (e.g. Voronoi may not support it?).

image

* why do you need to wrap this part : `box.wrap(points[nlist.point_indices] - points[nlist.query_point_indices])`?

This ensures that you get the short neighbor vectors shown in gray in the above image, and not the "long vector" (not shown) between the two red points. If you don't wrap, then you're effectively computing r_i + (r_j - r_i) = r_j. That's just the location of the neighboring point. Wrapping gives you r_i + wrap(r_j - r_i) = r_j', which is the location of the nearest ghost/image of point r_j to the point r_i.

* what's the difference between point_indices and query_point_indices?

See the NeighborList docs and Query API docs.

Optimox commented 3 years ago

Thanks @bdice, it's very clear and helpful.

I have around 20k points inside my box and 200K bounds, which leads box.contains(unwrapped_bonds) to take over a minute to run.

Using something like this would be much faster but I don't get the same results:

def check_inside_box(points, Lx, Ly, Lz):
    is_x = (points[:, 0] <= Lx/2) & (points[:, 0] >= -Lx/2)
    is_y = (points[:, 1] <= Ly/2) & (points[:, 1] >= -Ly/2)
    is_z = (points[:, 2] <= Lz/2) & (points[:, 2] >= -Lz/2)
    return (is_x & is_x & is_z)

I don't understand why it's not the case, since any "crossing" bond should end up oustide the original box (as shown in your image above). What's the difference? Any idea how to speed up the final computation?

bdice commented 3 years ago

Your simplified check in NumPy will only work for orthorhombic boxes (tilt factors xy, xz, yz equal to zero). freud supports triclinic boxes, which requires significant additional math. You can see the box documentation and C++ code in Box.h for more information. Also be warned that you have a typo is_x twice in the last line.

box.contains uses parallelized C++ internally so I am a bit surprised to see it take a minute — but it’s plausible if your CPU is slower, especially with many bonds. A few suggestions:

Perhaps you can produce a synthetic benchmark using the freud.data.make_random_system function. If you want further performance advice, I’ll need a sample data set and your script and/or the output of cProfile. Try using snakeviz to visualize the time for each call.

Optimox commented 3 years ago

my boxes are orthorhombic so I don't understand how I end up with different results. I'll come back to you with a simple reproducible example so that you can see exactly what I mean.

Sorry I read too quickly your response and did not see my typo, thanks for pointing it! I do have the same results now!

Optimox commented 3 years ago

Hello again @bdice,

Thanks again for your help, very much appreciated.

I just wanted to show you the problem I saw with the computational time using box.contains so that maybe this will help improve the library.

You can see bellow a code snippet to reproduce the experiment.

Here are the results in seconds (red for box.contains, green for numpy version), as you can see the freud implementation does not scale well at all. image

Moreover I wanted to point out something that I would consider a bug in the library (or at least something that could be improved): if you ask for a voronoi computation with one duplicated point in your set of points the code crashes without giving any error. I think it would be better to perform a duplicated check inside the function and throw a clear error. The same behavior happens if one point is outside the box, a simple check and clear error would be very useful for the users. I can create a specific issue about this if you prefer.


import numpy as np
import freud
import time
from matplotlib import pyplot as plt

def check_inside_box(points, freud_box):

    Lx, Ly, Lz = freud_box.Lx, freud_box.Ly, freud_box.Lz
    if Lz == 0:
        # for 2D don't take Lz into account
        Lz = 1000
    is_x = (points[:, 0] <= Lx/2) & (points[:, 0] >= -Lx/2)
    is_y = (points[:, 1] <= Ly/2) & (points[:, 1] >= -Ly/2)
    is_z = (points[:, 2] <= Lz/2) & (points[:, 2] >= -Lz/2)
    return (is_x & is_y & is_z)

L = 100

freud_timings = []
numpy_timings = []
N_points = range(100, 6000, 1000)

for n_points in N_points:
    print(f"Num points : {n_points}")

    points = np.random.randint(2, L-2, (n_points, 3))
    # center the points
    points = points - np.array([L/2, L/2, L/2])

    # MAKE SURE THERE ARE NO DUPLICATED POINTS
    # I consider this a bug of the library
    # without this if you have duplicates the code will crash with no error
    # it simply kills the kernel
    points = np.unique(points, axis=0)

#     points = points.astype(np.float32)

    box = freud.box.Box(Lx=L, Ly=L, Lz=L, is2D=False)

    voro = freud.locality.Voronoi()

    voro.compute((box, points));

    start = time.time()
    nlist = voro.nlist
    print(f"Computation for neighbors list took {time.time()-start:.5f} seconds")

    start = time.time()
    unwrapped_bonds = points[nlist.query_point_indices] \
                    + box.wrap(points[nlist.point_indices] \
                    - points[nlist.query_point_indices])
    print(f"Computation for wrapped points took {time.time()-start:.5f} seconds")

    start = time.time()
    real_connections = check_inside_box(unwrapped_bonds, box)
    numpy_timings.append(time.time()-start)
    print(f"Computation for numpy box containing took {numpy_timings[-1]:.5f} seconds")

    start = time.time()
    slow_real_connections = box.contains(unwrapped_bonds)
    freud_timings.append(time.time()-start) 
    print(f"Computation for freud box containing took {freud_timings[-1]:.5f} seconds")
    assert(np.all(slow_real_connections==real_connections))

fig = plt.Figure()
plt.plot(N_points, freud_timings, color="red")
plt.plot(N_points, numpy_timings, color="green")
plt.show()
bdice commented 3 years ago

@Optimox Good observation. freud's Box.contains method had an error that I have corrected in #840.

Voronoi crashing silently is also a bug! It should return an error because it is not possible to compute a Voronoi diagram of points that overlap one another. The computation is undefined. However, this should raise an error instead of crashing the program! This may be an issue in the library we use, voro++. Unfortunately I won't have time to look at this very soon. Would you be able to file a separate issue for that problem? Just copy-paste your script into the new issue.

Optimox commented 3 years ago

Yes sure I'll close this issue and create a few new issues if you don't mind :)