easybuilders / easybuild-easyblocks

Collection of easyblocks that implement support for building and installing software with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
106 stars 285 forks source link

use PRRTE MCA environment variable for oversubscription in OpenMPI easyblock #3360

Closed geimer closed 5 months ago

geimer commented 5 months ago

With Open MPI 5.x, PRRTE is used as run-time environment. This requires setting a different MCA environment variable to allow for node oversubscription when running tests. See https://docs.open-mpi.org/en/v5.0.x/mca.html#converting-mapping-parameters

bedroge commented 5 months ago

Thanks for this fix! I actually ran into that issue recently when testing the OpenMPI 5.0.3 easyconfig, but I hadn't looked into the cause. This seems to work fine for me (test report coming soon...).

bedroge commented 5 months ago

@boegelbot please test @ jsc-zen3 EB_ARGS="OpenMPI-5.0.3-GCC-13.3.0.eb"

boegelbot commented 5 months ago

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3360 EB_ARGS="OpenMPI-5.0.3-GCC-13.3.0.eb" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3360 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

Test results coming soon (I hope)...

*- notification for comment with ID 2165750313 processed* *Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).*
bedroge commented 5 months ago

Test report by @bedroge

Overview of tested easyconfigs (in order)

Build succeeded for 1 out of 1 (1 easyconfigs in total) bob-Latitude-5300 - Linux Ubuntu 22.04, x86_64, Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz, Python 3.10.12 See https://gist.github.com/bedroge/6337c328ce45a7b00660acd808f7e50a for a full test report.

edit: log shows that it was using the correct environment variable, e.g.:

== 2024-06-13 16:16:52,116 easyblock.py:3634 INFO sanity check command PRTE_MCA_rmaps_default_mapping_policy=:oversubscribe mpirun -n 6 /data/eb/build/OpenMPI/5.0.3/GCC-13.3.0/mpi_test_ring_usempi ran successfully! (output: Process 0 sending 10 to  1 tag 201 ( 6 processes in ring)
boegelbot commented 5 months ago

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 1 out of 1 (1 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18 See https://gist.github.com/boegelbot/2bd90138e57fb9e7da88bffce7f80f82 for a full test report.

bedroge commented 5 months ago

Test report by @bedroge

Overview of tested easyconfigs (in order)

Build succeeded for 1 out of 1 (1 easyconfigs in total) bob-Latitude-5300 - Linux Ubuntu 22.04, x86_64, Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz, Python 3.10.12 See https://gist.github.com/bedroge/c9e5cce9dcb327a055620a818a44d4c8 for a full test report.

edit: for this version it also picks up the right environment variable:

== 2024-06-13 17:05:05,445 easyblock.py:3634 INFO sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 6 /data/eb/build/OpenMPI/4.1.5/GCC-12.2.0/mpi_test_ring_usempi ran successfully!
bedroge commented 5 months ago

Test report by @bedroge

Overview of tested easyconfigs (in order)

Build succeeded for 2 out of 2 (2 easyconfigs in total) interactive2 - Linux Rocky Linux 8.9, x86_64, AMD EPYC-Milan Processor (zen2), Python 3.6.8 See https://gist.github.com/bedroge/f349b8b84ec078c6e63591ccd6e20804 for a full test report.