Closed kianpu34593 closed 3 years ago
Now the problem could be solved by installing:
spack install py-gpaw ^openmpi schedulers=slurm pmi=true
Instead of running with mpiexec/mpirun
, the installed openmpi only supports srun
. Therefore you can change the running command to:
srun -n [number of cores] gpaw python script.py
If you notice a performance issue on arjuna, please first search the existing issues and ensure that it has not been reported. If you notice a similar example, please comment on that issue.
Please provide the following information to help us help you:
Basic Info
Your Name: jiankun pu Your Andrew ID: jiankunp
If you are not a frequent github user, please also provide us with a contact email here: Contact Email:
Where it happened
Job Ids: 1465 Node(s) on which the problem occurred: c004
What Happened
Expected Behavior: running Observed Behavior: stuck but not failed
Log Files
Location of Log file Showing the Error: /home/jiankunp/projects/gpaw_install_benchmark/spack_2nd_install_openmpi/parrallel_test/gpu_multi_core_1465.err Location of Script showing Minimum Working Example: /home/jiankunp/projects/gpaw_install_benchmark/spack_2nd_install_openmpi/parrallel_test/gpu_multi_core.sh
Please also attach any logs and the submission script to this issue.
What I've Tried
Please List what you've tried to debug the issue. Please include commands and resulting output.
1) I googled https://github.com/open-mpi/ompi/issues/5798 I added:
--mca shmem posix
to mpiexec. It seems to work now. But it's very slow.