Closed mphoward closed 5 years ago
This is probably a bug introduced by my addition of SSAGES support. SSAGES manages the world communicator and necessitated this change.
@mphoward @AdamSimpson Would one of you be able to test #352 to see if this solves the problem? I think the remaining instances of MPI_COMM_WORLD
are correct. The case I have in mind is this: if SSAGES were to launch multiple multi-rank simulations on the same node, I think the GPU selection must use the rank in the true MPI_COMM_WORLD
, not the MPI communicator that the individual simulation is using. In most cases, m_hoomd_world
is the same as MPI_COMM_WORLD
. If anyone has other thoughts, we can re-consider that behavior.
If that's the case, do you really want to use m_hoomd_world
in this special check? It seems like if you are using MPI_COMM_WORLD
to select the GPU, then you should also use MPI_COMM_WORLD
here.
As a potential corner case, imagine that someone has 2 nodes with 2 GPUs each. They partition the MPI_COMM_WORLD
stupidly into different HOOMD worlds so that the first world gets the first GPU on each node (world0 = node0-gpu0 node1-gpu0
, world1 = node0-gpu1 node1-gpu1
). The local ranks for world0
are all 0s and the local ranks for world1
are all 1s, but the check will raise an error in world0
because it finds all the ranks are 0 and assumes SLURM hasn't set them. This would be avoided if the sum were taken over the full MPI_COMM_WORLD
, since then the nonzero ranks are included.
@mphoward That sounds correct to me, I made that change.
@bdice #352 looks good to me, thanks for the quick fix!
As a potential corner case, imagine that someone has 2 nodes with 2 GPUs each. They partition the
MPI_COMM_WORLD
stupidly into different HOOMD worlds so that the first world gets the first GPU on each node (world0 = node0-gpu0 node1-gpu0
,world1 = node0-gpu1 node1-gpu1
). The local ranks forworld0
are all 0s and the local ranks forworld1
are all 1s, but the check will raise an error inworld0
because it finds all the ranks are 0 and assumes SLURM hasn't set them. This would be avoided if the sum were taken over the fullMPI_COMM_WORLD
, since then the nonzero ranks are included.
A user might also choose partition only some ranks to a HOOMD simulation. In this case, guessLocalRank()
would deadlock using MPI_COMM_WORLD
. I think the only viable solution is to remove SLURM_LOCALID
entirely from guessLocalRank()
. System configurations that had been relying on SLURM_LOCALID
would then fall back on the global rank to select GPUs.
@AdamSimpson What system configuration are you using that doesn't provide one of MV2_COMM_WORLD_LOCAL_RANK
, OMPI_COMM_WORLD_LOCAL_RANK
, or JSM_NAMESPACE_LOCAL_RANK
? Is there another local rank env var that we should add to our search list?
That's a good point and would be a much more common case than the one I outlined.
Could it make sense to instead replace MPI_COMM_WORLD
with m_hoomd_world
in guessLocalRank()
, and keep the method around? There is no deadlock that way. It may fail in weird cases like the one I cooked up, but then the current behavior would be to fall back on the global rank selection anyway, which would be the same as if guessLocalRank
was totally removed.
@joaander This happens when launching applications through SLURMS srun
. It appears that when launching OMPI applications through srun
that OMPI_COMM_WORLD_LOCAL_RANK
isn't set as one might expect, and instead SLURM_LOCALID
is to be used.
@joaander This happens when launching applications through SLURMS
srun
. It appears that when launching OMPI applications throughsrun
thatOMPI_COMM_WORLD_LOCAL_RANK
isn't set as one might expect, and insteadSLURM_LOCALID
is to be used.
Thanks! That makes sense.
Fixed in #352
m_hoomd_world
is used before it is initialized inExecutionConfiguration::guessLocalRank
when theSLURM_LOCALID
environment variable is set. The offending lines are here:https://github.com/glotzerlab/hoomd-blue/blob/ce50f6ab5a181a0d2221b5e3cefbc46d3c503070/hoomd/ExecutionConfiguration.cc#L752-L753
It looks like an inconsistency was introduced in this method by whoever added
hoomd_world
as an argument to the constructor, since it uses a mix ofm_hoomd_world
andMPI_COMM_WORLD
. My guess is that lines 126-130 should probably be moved up before the GPU selection, andMPI_COMM_WORLD
replaced bym_hoomd_world
. @jglaser and @joaander should double check.Reported by @AdamSimpson