chr1shr / voro

Voro++: a three-dimensional Voronoi cell library in C++
Other
154 stars 44 forks source link

Determining whether a Voronoi cell is at the edge of the box #20

Closed shoubhikraj closed 1 year ago

shoubhikraj commented 2 years ago

I am using voro++ in a program for analysis of molecular dynamics trajectories (i.e. 3D coordinates). The main part involves calculating Voronoi cells for each atom, and then determining the density of that cell.

However, the problem is that in my case, there is no periodic boundary condition, so the particles that are at the edge should have infinitely large voronoi cells, if I understand correctly. This is fine, because I can then set the density to zero. But, I am not sure what would be the best way to do this.

One of the things I could do is to make a box large enough to contain all the particles (atoms) and then loop through each cell, finding if more than two of their vertices lie on any one of the faces of the box (which would mean that the cell would be infinite if the box was removed). But this seems too complicated, and unnecessary.

Is there any function in voro++ which can do this in a simpler way? The only thing to find out is whether the cell would be infinite when the box is removed. Any help is greatly appreciated.

chr1shr commented 2 years ago

This is a good question. Currently, Voro++ does not have any support for doing computations in infinite domains, since the basic data type is the voronoicell class representing a Voronoi cell as a convex irregular polyhedron.

One strategy is to initialize a large container (e.g. ten times larger than the extent of your particles) and then treat Voronoi cells that extend to the edges of the container as being infinite. You can do that using the method you suggest, where you look for Voronoi vertices on the container boundary. Another way is to use the voronoicell_neighbor class, which also stores which neighbor created each face. Any faces that are created by the container walls are assigned numbers from -1 to -6. It would be easy to scan this list to find any such faces.

While a large box will give you a good approximation of infinite Voronoi cells, it won't be perfect. For example, in 2D, consider the points (1,0), (-1,0), (0,-1), and (0,-ε) where ε is a small number. The Voronoi cell for (0,-ε) will look like a triangle, and as ε tends to zero, that triangle will extend out arbitrarily far from the original points. Hence there is no finite-sized container that would guarantee that such a triangle would be recognized as finite.

It will still be a very good approximation, though, which may be sufficient for your molecular dynamics application.

shoubhikraj commented 2 years ago

@chr1shr Thanks a lot! That is very helpful.

So, would something like this work then?

const auto n_heavy_atoms = atom_list.size() // total number of heavy atoms
voro::container mybox(x_min - 100.0, x_max + 100.0, y_min - 100.0, y_max + 100.0, z_min - 100.0, z_max + 100.0, 1, 1, 1, false, false, false, n_heavy_atoms);
            for (size_t k = 0; k < n_heavy_atoms; ++k) {
                x = positions_this_frame[heavy_atom_matches[k]][0]; // these return the atom positions
                y = positions_this_frame[heavy_atom_matches[k]][1];
                z = positions_this_frame[heavy_atom_matches[k]][2];
                mybox.put(k, x, y, z); // put heavy atoms into the box with k ID
            }

            voro::c_loop_all loop_cursor(mybox);
            voro::voronoicell_neighbor this_cell;
            std::vector<int> neighbour_list;
            if (loop_cursor.start()) do if (mybox.compute_cell(this_cell, loop_cursor)) {
                auto k = loop_cursor.pid(); // returns the ID of the particle
                this_cell.neighbors(neighbour_list);
                if (is_cell_at_edge(neighbour_list)) {
                    rhov_val.at(k) = 0.0; // vector that holds the Voronoi density
                }
                else {
                    rhov_val.at(k) = 1.0 / this_cell.volume(); // density is in number per Angstrom^3
                }
            } while (loop_cursor.inc());
bool is_cell_at_edge(const std::vector<int>& neighbour_list)
{
    if (neighbour_list.size() == 0) {
        cout << "Neighbour list has 0 size, exiting.\n"<<std::endl;
        exit(1);
    }
    for (const auto z : neighbour_list) {
        if (z < 0) {
            return true;
        }
        else {
            return false;
        }
    }
}

The neighbour list should only contain negative numbers when one face is made of the boundary, is that right?

Is there any loss of speed of the program if the container size is increased a lot? Or does it only depend on the number of particles?

I also wanted to ask if there is any benefit to increasing the number of blocks in each container box? In the voro++ example, it is mentioned that having blocks increases computational efficiency, but if I understand correctly, voro++ does not use threading or SIMD, so how does it increase efficiency? (I am a chemist mostly, so I don't really understand how the math of Voronoi polyhedra works, I searched the source code for threading or vectorization directives but couldn't find any, sorry if the question sounds rude)

chr1shr commented 2 years ago

Your code looks correct. I assume that all of your particles have IDs that are ≥ 0. In that case the only negative numbers in the neighbor list will be from the boundary.

Internally, Voro++ sorts the particles into a grid of blocks, which greatly aids in speeding up the computation. When you are building the Voronoi cell for particular particle, you need to rapidly find nearby particles, and this can be done efficiently by looking at the particles in the nearby blocks.

There is a tradeoff in the block size. If the blocks are too large (e.g. 100 particles per block) then it's still not very efficient because you still need to consider a large number of particles that are probably not so relevant. But if the blocks are too small (e.g. 1 particle per block) then it also becomes inefficient because you have scan through a very large number of blocks. In my tests of randomly distributed particles in a region, I typically find that ~5 particles per block is a good number.

I think good analogy is storing cooking equipment in kitchen cupboards. You wouldn't want to design a kitchen with just one giant cupboard, because you'd spend all your time rummaging around in it to find what you need. But equally you wouldn't design a separate cupboard for every different utensil because you'd spend all your time opening cupboard doors to get what you need. Thus a happy medium (which is how most kitchens are designed) would be most effective.

If you pad the domain, I am not quite sure what the optimal block number will work out as. You could run some timing tests to look at this. There may be a small performance hit, but I expect that it won't affect the performance too much. I think this is a good starting point for choosing: suppose you had a domain of size L × L × L and you were using N × N × N blocks for optimal calculations. Then if you created a padded domain of size 3L × 3L × 3L, you could try 3N × 3N × 3N blocks. That would mean that in the part of the domain with the particles, the number of particles per block would be roughly close to the optimal.

(By the way we currently have a version in development that does multithreading, and we aim to release this in a few months. Voro++ is very well structured for multithreading.)

shoubhikraj commented 2 years ago

@chr1shr Thanks! I set up the block size so that each block contains roughly 5 particles as you mentioned, and my execution time was cut by half, so it was a massive improvement. I will use the multithreaded version when it is released, that should improve performance even more.

I am not sure what to do about the amount of memory reserved for each block however. I set it to 1, because most of the blocks in the outer parts of the box would be empty, and the overhead for allocating memory dynamically doesn't seem to make much of a difference to the execution time.

chr1shr commented 2 years ago

Yes, setting the initial block memory to 1 makes sense so that the amount of memory allocated in the empty regions is small.