Open matthiasdiener opened 4 years ago
Running with charmrun
but also with ++local
exhibits the same problem.
I'm seeing the same error when trying to run namd@git-master
built against charmpp@6.10.1
with ucx
back-end. (I tried compiling namd@git-master
with charm++@git-master
but that failed so I ended up building it with 6.10.1
version.) Adding a charmrun in front of the application does not solve the problem
charmrun +p16 namd2 stmv.namd ++mpiexec ++remote-shell mpiexec --oversubscribe </dev/null &> mpirun_$run
srun --mpi=pmix_v2 -N 2 -n 8 -c 2 charmrun namd2 stmv.namd </dev/null &> srun_$run
both fail with an error similar to the above but the srun case first prints the following statements a couple of times before crashing with the UcxInitEps
error :
Running on 1 processors: namd2 stmv.namd
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 1 namd2 stmv.namd
What would the workaround for this case be ?
Hoping that this is the right place for this question as opposed to the namd mailing list, let me know if that's not the case and I'll post there instead.
Edit : attached namd build dependencies and configuration namd_build_config.txt
I realized that this could be due to the fact that OpenMPI
was built without PMIx
support. When I built charm++
with the slurmPMI2
backend, everything worked as expected. Apologies for the unnecessary post on the issue tracker.
It might be good to have a check during charm++
configure that would raise a warning about the OpenMPI
missing PMIx
support when the ompipmix
flag is passed.
Crash observed on application startup on golub:
Running with Charmrun works fine: