Distributed memory (MPI) computation

Pulpas commented 1 year ago

Dear Prof. Rycroft,

Before bothering you with my implementation question, I would like to thank you for developing such a nice library. This library is on the edge of becoming one of our central tool in the computation of the forces to transport particles in CFD/DEM simulations, as explained in our recent paper (https://arxiv.org/abs/2308.13299).

In the first part of our work we mostly focused on post-treatment using the Voro++ library to study the structure of an arrangement of particles, so numerical efficiency was not a milestone. Now that we derive a model based on the microstructure information thanks to the Voro++ library, we need to couple Voro++ with our in-house CFD/DEM solver to perform simulation with millions of mondispersed spherical particle. With the aim of an efficient implementation, I read your recently published paper, An extension to Voro++ for multithreaded computation of Voronoi cells, but is however limited to OpenMP libraries. Is there any plan to extend this work to MPI for massive simulations ?

chr1shr commented 1 year ago

I'm glad to hear you have found the library useful. We are indeed planning to release an MPI-based version of the library, and we have started work on it, but it is a major project and it will likely be a year or two before we have anything available.

I've thought about this problem a lot. It's relatively easy to get a solution that works well for 95% of practical cases. But our goal is to build a general tool that can work well in all cases, and that brings up a lot of challenges.

If you have a dense particle arrangement, then one approach is to divide the domain up into rectangular blocks that are passed out to the MPI processes, and then have each block grab a strip of width 2W from the neighboring blocks. If the Voronoi cells for particles in the block are each bounded by a sphere of radius W, then you can guarantee that the Voronoi computations are complete, and that the strips contained sufficient neighboring particles to obtain the complete calculation.

This can be very effective in dense particle arrangements, since the Voronoi cells are usually of similar size and are geometrically compact. The LAMMPS code has a Voro++ interface that uses this principle and it works very well in many cases. For your application, you might also be dealing with dense arrangements. Alternatively, if you are willing to artificially clip your Voronoi cells so that they do not extend beyond some fixed radius R (like this example) then so long as W ≥ R then you will get a complete calculation.

The difficulty for making a general tool is that for some particle arrangements, the Voronoi cells may extend for a large distance. This could happen, e.g., if you had localized clusters of particles, with large voids of empty space between them. The Voronoi cells for particles on the surface of each cluster would extend a large distance away. In this case, the strip of width 2W may not contain enough particles to fully compute the Voronoi cells.

You can even imagine an extreme case, where you have a 10 by 10 by 10 block of MPI processes, and you have two particles A and B on distant corners of the domain. Then the process containing particle A needs to know about particle B in order to compute a complete Voronoi cell, therefore requiring non-local communication. Coming up with a program that works efficiently for all these cases is challenging.

In your situation, if you are using dense particles, or if you are fine with clipping them beyond some radius, then you could perhaps use the serial or multi-threaded Voro++ on each processor, and pass the strips with MPI in order to compute the valid Voronoi cells.

Pulpas commented 1 year ago

Dear Prof. Rycroft,

Thanks a lot for your detailed answer, as a follow up to your email I would like to share with you what I ended up implementing in a few lines.

I use a Cartesian uniform decomposition of my domain which matches the processor cut, each processor knows its particle and the neirhbors ones. Up to this point, something very classic that is often used to efficiently detect particles collisions. As you previously mentioned, this can be used to compute the Voronoi for a dense case. However, the case that you illustrate with two particles located very far from each other is indeed a limitation to this implementation.

To circumvent it, I adapt the number of processors neighbors to consider in the construction of the Voronoi diagram per processor (information in the ghosts are not conserved to compute our Minowski tensors, the expensive part of the solver) as a function of the local solid volume fraction. Thus, the fewer particles the more processors I need to consider in my ghost layers.

So far it seems to work, but it is not the perfect voronoi diagram that one may obtain over one processor.

Thanks again for your very detailed answer and I'm looking forward to the MPI release :)

Best,

Le dim. 3 sept. 2023 à 22:53, Chris Rycroft @.***> a écrit :

Closed #34 https://github.com/chr1shr/voro/issues/34 as completed.

— Reply to this email directly, view it on GitHub https://github.com/chr1shr/voro/issues/34#event-10267329994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUGHOYZWSA7DXVMBMBOMRS3XYTU4FANCNFSM6AAAAAA4GLCMYM . You are receiving this because you authored the thread.Message ID: @.***>

chr1shr / voro

Distributed memory (MPI) computation #34