Dynamic load balancing - Githubissues

joaander commented 9 years ago

Original report by Michael Howard (Bitbucket: mphoward, GitHub: mphoward).

HOOMD should support dynamic load balancing with respect to either the number of particles per rank, or better, the time spent in the most expensive calculations (i.e., force calculation) in MPI simulations. Load balancing is important for simulations with density gradients (ex: vapor-liquid interface), or when the force calculation for certain particles is significantly more expensive then others (ex: particle size disparity with clustering).

The simplest way to do load balancing is to maintain a grid of boxes, and adjust the different dividing planes along the Cartesian axes. This implementation is found to some extent in LAMMPS and GROMACS. I have implemented a new BalancedDomainDecomposition class where the user can manually adjust the decomposition before the system is initialized. But, this requires the user to know the best way to adjust the domains without having the particle data available. It would be better for this process to be automated using a load balancing heuristic after the particle data is loaded. An easy extension of this idea is to then resize the domains periodically using an Updater. The user can tune the frequency of this load balancing as needed for their simulations, whether it be once at the start of the simulation or throughout.

I have a big interest in seeing this implemented for a project I'm working on, and I've started a prototype for it. I wanted to post this idea here for feedback, particularly from @jglaser.

joaander commented 9 years ago

It is a lot more work in updating the communication algorithms, but the rcb methods (see lammps for an example) is pretty powerful for highly nonuniform particle distributions. http://lammps.sandia.gov/doc/balance.html

In general, I'm all for anything that improves hoomd - so starting with the simplest approach is fine.

joaander commented 9 years ago

Original comment by Michael Howard (Bitbucket: mphoward, GitHub: mphoward).

The rcb method described would definitely be a lot of work -- particularly in migrating particles, since there is no longer a well defined relation linking adjacent boxes and their ghost layers. If you were willing to go to this much trouble, maybe it would be worth considering something like in GROMACS (see Section II in http://pubman.mpdl.mpg.de/pubman/item/escidoc:588952:2/component/escidoc:588951/412029.pdf), which is the method developed by DE Shaw, where pair forces can be calculated on processors that don't necessarily even own either particle.

In a Lennard-Jones vapor-liquid simulation, I was able to get pretty good speedups close to the "predicted" value from the LAMMPS grid imbalance factor (speedup ~ N_max / N_avg) when I adjusted the domain decomposition manually based on a measured density profile. At a certain point, you are of course still wasting processors if you have a really huge empty volume. But, the simple solution will be a good enough improvement for me right now -- I'll leave the harder one for someone else to try. :-)

joaander commented 9 years ago

Original comment by Jens Glaser (Bitbucket: jens_glaser, GitHub: jglaser).

the BalancedDomainDecomposition approach sounds like a reasonable generalization. For more powerful schemes, see also J. Grime and G. Voth, “Highly Scalable and Memory Efficient Ultra-Coarse-Grained Molecular Dynamics Simulations,” J. Chem. Theory Comput., vol. 10, pp. 423–431, 2014.

joaander commented 9 years ago

Simple is good. I'm going to see about reviewing the PR today.

I assume you've been using the code for a while and found it to be functioning and useful?

joaander commented 9 years ago

Original comment by Michael Howard (Bitbucket: mphoward, GitHub: mphoward).

I've been using it successfully for a couple weeks now. It works very well (and is very useful) for systems that are capable of being load balanced by particle number (e.g. Lennard-Jones with a vapor-liquid interface). It currently has trouble if:

The different particle types cost significantly different amounts so that using particle number as a proxy for raw timings is not a good idea.
The system cannot be load balanced. One way this happens is when the domains want to shrink smaller than the minimum size that is allowed (2*rghost), leading to alot of wasted cycles to figure this out. Another way this happens is if the particles are arranged in some pathological way so that they don't have smooth density profiles. For example, if you had a single particle in the box, the domain boundaries would oscillate backwards and forwards so that on average each rank owned the particle a fraction of the time.

A short term solution to both these problems is that the user needs to be aware of problems like this in their system, and set balancing parameters that will remedy most of these.

A long term solution for (1) is to use raw timings instead of particle counts and for (2) is to upgrade the optimization routine so that there is some sort of quasi-Newton approach that is aware of how the domains were previously adjusted and kills oscillation / takes smaller steps to adjust the domains.

joaander commented 9 years ago

Merged in mphoward/hoomd-blue/eigen (pull request #81)

Add Eigen library

refs #87

joaander commented 9 years ago

Merged in mphoward/hoomd-blue/load_balance (pull request #82)

Implement dynamic load balancing

fixes #87 refs #73

glotzerlab / hoomd-blue

Dynamic load balancing #87