Closed kianpu34593 closed 3 years ago
This is the .err file:
The application appears to have been direct launched using "srun", but OMPI was not built with SLURM support. This usually happens when OMPI was not configured --with-slurm and we weren't able to discover a SLURM installation in the usual places.
Please configure as appropriate and try again.
An error occurred in MPI_Init on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [d003:05755] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
So, it looks to me like we need to make sure spack knows about slurm. To do so, we need to first tell spack where slurm is and then when we build it make sure we build it with slurm support.
Here're what I think the two options are:
1) Spack apparently has a "scheduler-aware" variant of openmpi, configure the gpaw install to build that 2) We build a system openmpi that works fine, use that by following the instructions in the docs to using system packages: https://spack.readthedocs.io/en/latest/build_settings.html
I'm working through both of these options, so will get back on which I like better and why. My first preference would be (1). In the meantime, I'd suggest trying one of the two.
Scratch my previous preference. Use the system openmpi.
@kianpu34593 can I close this?
If you notice a performance issue on arjuna, please first search the existing issues and ensure that it has not been reported. If you notice a similar example, please comment on that issue.
Please provide the following information to help us help you:
Your Name: Your Andrew ID: jiankunp Node(s) on which the problem occurred:d003 Expected Behavior: running Observed Behavior: not working Location of Log file Showing the Error: /home/jiankunp/gpaw_install_benchmark/spack_1st_install/cpu_1_core.err Location of Script showing Minimum Working Example: /home/jiankunp/gpaw_install_benchmark/spack_1st_install/cpu_1_core.sh
Please also attach any logs and the submission script to this issue.
If you are not a frequent github user, please also provide us with a contact email here: Contact Email:
If you do not have any of the above, please explain why you do not have it, and submit the issue, however, the more information you give us, the better we can help you.
This problem occurs when using spack installed gpaw. Conda installed gpaw is running just fine.