easybuilders / easybuild

EasyBuild - building software with ease
http://easybuild.io
GNU General Public License v2.0
464 stars 143 forks source link

OpenMPI/4.0.5-GCC-10.2.0 foss2020b mpirun error #755

Open connorourke opened 2 years ago

connorourke commented 2 years ago

I am getting the following error when building FFTW/3.3.8/gompi-2020b with the foss-2020b toolchain.

Executing "mpirun -np 1 /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi/mpi-bench --verbose=1   --verify 'obcd52x48' --verify 'ibcd52x48' --verify 'ofcd52x48' --verify 'ifcd52x48' --verify 'obrd[11x2x4x12' --verify 'ibrd[11x2x4x12' --verify 'obcd[11x2x4x12' --verify 'ibcd[11x2x4x12' --verify 'ofcd[11x2x4x12' --verify 'ifcd[11x2x4x12' --verify 'obr[3x13v22' --verify 'ibr[3x13v22' --verify 'obc[3x13v22' --verify 'ibc[3x13v22' --verify 'ofc[3x13v22' --verify 'ifc[3x13v22' --verify 'okd]12o11x7e11x2o00' --verify 'ikd]12o11x7e11x2o00' --verify 'obr4x9x2' --verify 'ibr4x9x2' --verify 'ofr4x9x2' --verify 'ifr4x9x2' --verify 'obc4x9x2' --verify 'ibc4x9x2' --verify 'ofc4x9x2' --verify 'ifc4x9x2' --verify 'ok[6e10x7o01' --verify 'ik[6e10x7o01' --verify 'obr6x9x8x10' --verify 'ibr6x9x8x10' --verify 'ofr6x9x8x10' --verify 'ifr6x9x8x10' --verify 'obc6x9x8x10' --verify 'ibc6x9x8x10' --verify 'ofc6x9x8x10' --verify 'ifc6x9x8x10' --verify 'okd6e10x5o01x5e11x7o01' --verify 'ikd6e10x5o01x5e11x7o01' --verify 'ofr]5x20v4' --verify 'ifr]5x20v4' --verify 'obc]5x20v4' --verify 'ibc]5x20v4' --verify 'ofc]5x20v4' --verify 'ifc]5x20v4'"
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
FAILED mpirun -np 1 /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi/mpi-bench:  --verify 'obcd52x48' --verify 'ibcd52x48' --verify 'ofcd52x48' --verify 'ifcd52x48' --verify 'obrd[11x2x4x12' --verify 'ibrd[11x2x4x12' --verify 'obcd[11x2x4x12' --verify 'ibcd[11x2x4x12' --verify 'ofcd[11x2x4x12' --verify 'ifcd[11x2x4x12' --verify 'obr[3x13v22' --verify 'ibr[3x13v22' --verify 'obc[3x13v22' --verify 'ibc[3x13v22' --verify 'ofc[3x13v22' --verify 'ifc[3x13v22' --verify 'okd]12o11x7e11x2o00' --verify 'ikd]12o11x7e11x2o00' --verify 'obr4x9x2' --verify 'ibr4x9x2' --verify 'ofr4x9x2' --verify 'ifr4x9x2' --verify 'obc4x9x2' --verify 'ibc4x9x2' --verify 'ofc4x9x2' --verify 'ifc4x9x2' --verify 'ok[6e10x7o01' --verify 'ik[6e10x7o01' --verify 'obr6x9x8x10' --verify 'ibr6x9x8x10' --verify 'ofr6x9x8x10' --verify 'ifr6x9x8x10' --verify 'obc6x9x8x10' --verify 'ibc6x9x8x10' --verify 'ofc6x9x8x10' --verify 'ifc6x9x8x10' --verify 'okd6e10x5o01x5e11x7o01' --verify 'ikd6e10x5o01x5e11x7o01' --verify 'ofr]5x20v4' --verify 'ifr]5x20v4' --verify 'obc]5x20v4' --verify 'ibc]5x20v4' --verify 'ofc]5x20v4' --verify 'ifc]5x20v4'
make[3]: *** [Makefile:890: check-local] Error 1
make[3]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make[2]: *** [Makefile:754: check-am] Error 2
make[2]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make[1]: *** [Makefile:756: check] Error 2
make[1]: Leaving directory '/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/build/FFTW/3.3.8/gompi-2020b/fftw-3.3.8/mpi'
make: *** [Makefile:708: check-recursive] Error 1
 (at easybuild/tools/run.py:618 in parse_cmd_output)
== 2021-11-03 08:17:04,137 build_log.py:265 INFO ... (took 3 mins 51 secs)
== 2021-11-03 08:17:04,137 filetools.py:1971 INFO Removing lock /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock...
== 2021-11-03 08:17:04,141 filetools.py:380 INFO Path /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock successfully removed.
== 2021-11-03 08:17:04,141 filetools.py:1975 INFO Lock removed: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/.locks/_scratch_cor22_bin_BUILD_EB_janus_easybuild_instances_fsv2_2020b_software_FFTW_3.3.8-gompi-2020b.lock
== 2021-11-03 08:17:04,141 easyblock.py:3915 WARNING build failed (first 300 chars): cmd " export OMPI_MCA_rmaps_base_oversubscribe=true &&   make check " exited with exit code 2 and output:

Looks like it is down to the mpi installation, so I tried running a simple mpi hello world program with OpenMPI/4.0.5-GCC-10.2.0 and the foss-2020b toolchain and got the following error.

I expect this is down to the compute instance (azure fsv2) i am running on being single node, and not connected to other nodes via infiniband. Maybe I need to add a hook to tell easybuild as much. Does anyone have any insight as to how to solve the problem?

[ip-AC125814:62000] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62001] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61992] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62004] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61993] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61994] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61995] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61997] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61998] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61999] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62006] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62002] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61990] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61991] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:61996] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62003] pml_ucx.c:291  Error: Failed to create UCP worker
[ip-AC125814:62005] pml_ucx.c:291  Error: Failed to create UCP worker
[1635931846.182478] [ip-AC125814:62000:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182457] [ip-AC125814:62001:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182433] [ip-AC125814:61990:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.182433] [ip-AC125814:61992:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.184738] [ip-AC125814:61995:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.184783] [ip-AC125814:61997:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185896] [ip-AC125814:61999:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186238] [ip-AC125814:62002:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185291] [ip-AC125814:62004:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186381] [ip-AC125814:62006:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.185922] [ip-AC125814:61993:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186175] [ip-AC125814:61994:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.187805] [ip-AC125814:61996:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186041] [ip-AC125814:61998:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.187811] [ip-AC125814:62003:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.186142] [ip-AC125814:61991:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
[1635931846.937278] [ip-AC125814:62005:0]      ib_device.c:670  UCX  ERROR ibv_query_gid(dev=mlx5_an0 port=1 index=0) failed: Invalid argument
 Hello World from process:            3 of           17
 Hello World from process:            5 of           17
 Hello World from process:            8 of           17
 Hello World from process:            7 of           17
 Hello World from process:            2 of           17
 Hello World from process:           10 of           17
 Hello World from process:            9 of           17
 Hello World from process:           12 of           17
 Hello World from process:           14 of           17
 Hello World from process:           11 of           17
 Hello World from process:            1 of           17
 Hello World from process:           15 of           17
 Hello World from process:           13 of           17
 Hello World from process:            0 of           17
 Hello World from process:           16 of           17
 Hello World from process:            4 of           17
 Hello World from process:            6 of           17

ompi_info gives:

  Package: Open MPI cor22@ip-AC125806 Distribution
                Open MPI: 4.0.5
  Open MPI repo revision: v4.0.5
   Open MPI release date: Aug 26, 2020
                Open RTE: 4.0.5
  Open RTE repo revision: v4.0.5
   Open RTE release date: Aug 26, 2020
                    OPAL: 4.0.5
      OPAL repo revision: v4.0.5
       OPAL release date: Aug 26, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.5
                  Prefix: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/OpenMPI/4.0.5-GCC-10.2.0
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: ip-AC125806
           Configured by: cor22
           Configured on: Tue Nov  2 11:59:25 GMT 2021
          Configure host: ip-AC125806
  Configure command line: '--prefix=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/OpenMPI/4.0.5-GCC-10.2.0'
                          '--build=x86_64-pc-linux-gnu'
                          '--host=x86_64-pc-linux-gnu'
                          '--enable-mpirun-prefix-by-default'
                          '--enable-shared' '--with-cuda=no'
                          '--with-hwloc=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/hwloc/2.2.0-GCCcore-10.2.0'
                          '--with-libevent=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/libevent/2.1.12-GCCcore-10.2.0'
                          '--with-ofi=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/libfabric/1.11.0-GCCcore-10.2.0'
                          '--with-pmix=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/PMIx/3.1.5-GCCcore-10.2.0'
                          '--with-ucx=/scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/UCX/1.9.0-GCCcore-10.2.0'
                          '--without-verbs'
                Built by: cor22
                Built on: Tue Nov  2 12:14:56 GMT 2021
              Built host: ip-AC125806
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/gcc
  C compiler family name: GNU
      C compiler version: 10.2.0
            C++ compiler: g++
   C++ compiler absolute: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /scratch/cor22/bin/BUILD/EB/janus_easybuild/instances/fsv2/2020b/software/GCCcore/10.2.0/bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: no
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.5)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.5)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.5)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.5)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA event: external (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.5)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.5)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.0.5)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA pmix: ext3x (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.5)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.5)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.5)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.5)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.5)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.5)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.5)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.5)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.5)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.5)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.0.5)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.5)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.5)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.5)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.5)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.0.5)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.0.5)
ocaisa commented 2 years ago

We also saw this in https://github.com/EESSI/software-layer/issues/136 for Azure. The fix is to export OMPI_MCA_pml=ucx, but this is in general fixed in later versions of OpenMPI.