glotzerlab / hoomd-blue

Molecular dynamics and Monte Carlo soft matter simulation on GPUs.
http://glotzerlab.engin.umich.edu/hoomd-blue
BSD 3-Clause "New" or "Revised" License
335 stars 131 forks source link

Uninitialized m_hoomd_world with slurm #351

Closed mphoward closed 5 years ago

mphoward commented 5 years ago

m_hoomd_world is used before it is initialized in ExecutionConfiguration::guessLocalRank when the SLURM_LOCALID environment variable is set. The offending lines are here:

https://github.com/glotzerlab/hoomd-blue/blob/ce50f6ab5a181a0d2221b5e3cefbc46d3c503070/hoomd/ExecutionConfiguration.cc#L752-L753

It looks like an inconsistency was introduced in this method by whoever added hoomd_world as an argument to the constructor, since it uses a mix of m_hoomd_world and MPI_COMM_WORLD. My guess is that lines 126-130 should probably be moved up before the GPU selection, and MPI_COMM_WORLD replaced by m_hoomd_world. @jglaser and @joaander should double check.

Reported by @AdamSimpson

bdice commented 5 years ago

This is probably a bug introduced by my addition of SSAGES support. SSAGES manages the world communicator and necessitated this change.

bdice commented 5 years ago

@mphoward @AdamSimpson Would one of you be able to test #352 to see if this solves the problem? I think the remaining instances of MPI_COMM_WORLD are correct. The case I have in mind is this: if SSAGES were to launch multiple multi-rank simulations on the same node, I think the GPU selection must use the rank in the true MPI_COMM_WORLD, not the MPI communicator that the individual simulation is using. In most cases, m_hoomd_world is the same as MPI_COMM_WORLD. If anyone has other thoughts, we can re-consider that behavior.

mphoward commented 5 years ago

If that's the case, do you really want to use m_hoomd_world in this special check? It seems like if you are using MPI_COMM_WORLD to select the GPU, then you should also use MPI_COMM_WORLD here.

As a potential corner case, imagine that someone has 2 nodes with 2 GPUs each. They partition the MPI_COMM_WORLD stupidly into different HOOMD worlds so that the first world gets the first GPU on each node (world0 = node0-gpu0 node1-gpu0, world1 = node0-gpu1 node1-gpu1). The local ranks for world0 are all 0s and the local ranks for world1 are all 1s, but the check will raise an error in world0 because it finds all the ranks are 0 and assumes SLURM hasn't set them. This would be avoided if the sum were taken over the full MPI_COMM_WORLD, since then the nonzero ranks are included.

bdice commented 5 years ago

@mphoward That sounds correct to me, I made that change.

AdamSimpson commented 5 years ago

@bdice #352 looks good to me, thanks for the quick fix!

joaander commented 5 years ago

As a potential corner case, imagine that someone has 2 nodes with 2 GPUs each. They partition the MPI_COMM_WORLD stupidly into different HOOMD worlds so that the first world gets the first GPU on each node (world0 = node0-gpu0 node1-gpu0, world1 = node0-gpu1 node1-gpu1). The local ranks for world0 are all 0s and the local ranks for world1 are all 1s, but the check will raise an error in world0 because it finds all the ranks are 0 and assumes SLURM hasn't set them. This would be avoided if the sum were taken over the full MPI_COMM_WORLD, since then the nonzero ranks are included.

A user might also choose partition only some ranks to a HOOMD simulation. In this case, guessLocalRank() would deadlock using MPI_COMM_WORLD. I think the only viable solution is to remove SLURM_LOCALID entirely from guessLocalRank(). System configurations that had been relying on SLURM_LOCALID would then fall back on the global rank to select GPUs.

@AdamSimpson What system configuration are you using that doesn't provide one of MV2_COMM_WORLD_LOCAL_RANK, OMPI_COMM_WORLD_LOCAL_RANK, or JSM_NAMESPACE_LOCAL_RANK? Is there another local rank env var that we should add to our search list?

mphoward commented 5 years ago

That's a good point and would be a much more common case than the one I outlined.

Could it make sense to instead replace MPI_COMM_WORLD with m_hoomd_world in guessLocalRank(), and keep the method around? There is no deadlock that way. It may fail in weird cases like the one I cooked up, but then the current behavior would be to fall back on the global rank selection anyway, which would be the same as if guessLocalRank was totally removed.

AdamSimpson commented 5 years ago

@joaander This happens when launching applications through SLURMS srun. It appears that when launching OMPI applications through srun that OMPI_COMM_WORLD_LOCAL_RANK isn't set as one might expect, and instead SLURM_LOCALID is to be used.

joaander commented 5 years ago

@joaander This happens when launching applications through SLURMS srun. It appears that when launching OMPI applications through srun that OMPI_COMM_WORLD_LOCAL_RANK isn't set as one might expect, and instead SLURM_LOCALID is to be used.

Thanks! That makes sense.

joaander commented 5 years ago

Fixed in #352