Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
58 stars 43 forks source link

Connection refused for `Isend` over OpenMPI for `F2s_v2` nodes #64

Closed severinson closed 3 years ago

severinson commented 3 years ago

Hello,

OpenMPI was recently upgraded from version 4.0.5 to 4.1.0 on CycleCloud. Since the upgrade I'm having issues using non-blocking communication with Slurm on CycleCloud.

First, I have to use -mca ^hcoll to avoid warnings regarding InfiniBand, which F2s_v2 nodes are not equipped with. I had this issue also with version 4.0.5.

Second, since the recent upgrade to OpenMPI v4.1.0, non-blocking communication has stopped working for me. The code I have worked for OpenMPI v4.0.5.

This is the error I'm getting. I've confirmed that the problem occurs when I call MPI_Isend. I'm attaching a small example to reproduce this problem below.

Process 1 started 
Initiating communication on worker 1
[1622528555.546527] [ip-0A000007:9268 :0] sock.c:259 UCX ERROR connect(fd=30, dest_addr=127.0.0.1:58173) failed: Connection refused
[ip-0A000007:09268] pml_ucx.c:383  Error: ucp_ep_create(proc=0) failed: Destination is unreachable
[ip-0A000007:09268] pml_ucx.c:453  Error: Failed to resolve UCX endpoint for rank 0
[ip-0A000007:09268] *** An error occurred in MPI_Isend
[ip-0A000007:09268] *** reported by process [3673554945,1]
[ip-0A000007:09268] *** on communicator MPI_COMM_WORLD
[ip-0A000007:09268] *** MPI_ERR_OTHER: known error not in list
[ip-0A000007:09268] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ip-0A000007:09268] ***    and potentially your MPI job)

Program code (mpi_isend.c):

#include <mpi.h>
#include <math.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
      MPI_Comm comm;
      MPI_Request request;
      MPI_Status status;
      int myid, master, tag, proc, my_int, p, ierr;

      comm = MPI_COMM_WORLD;       
      ierr = MPI_Init(&argc, &argv);           /* starts MPI */
      MPI_Comm_rank(comm, &myid);           /* get current process id */
      MPI_Comm_size(comm, &p);               /* get number of processes */

      master = 0;
      tag = 123;        /* set the tag to identify this particular job */
      printf("Process %d started", myid);

      if(myid == master) {
            for (proc=1;proc<p;proc++) {
                  MPI_Recv(
                        &my_int, 1, MPI_FLOAT,    /* triplet of buffer, size, data type */
                        MPI_ANY_SOURCE,       /* message source */
                        MPI_ANY_TAG,          /* message tag */
                        comm, &status);        /* status identifies source, tag */
                  printf("Received from 1 worker");
            }
            printf("Master finished");
      } else {
        printf("Initiating communication on worker %d", myid);
            MPI_Isend(       /* non-blocking send */
              &my_int, 1, MPI_FLOAT,       /* triplet of buffer, size, data type */
                  master, 
                  tag,
                  comm, 
                  &request);       /* send my_int to master */
            MPI_Wait(&request, &status);    /* block until Isend is done */
        printf("Worker %d finished", myid);
      }
      MPI_Finalize();                        /* let MPI finish up ... */
}

Jobfile (mpi_isend.job):

#!/bin/sh -l
#SBATCH --job-name=pool
#SBATCH --output=pool.out
#SBATCH --nodes=2
#SBATCH --time=600:00
#SBATCH --tasks-per-node=1
#SBATCH --partition=hpc
mpirun -mca ^hcoll mpi_isend

Steps to reproduce:

mpicc mpi_isend.c -o mpi_isend
sbatch mpi_isend.job
severinson commented 3 years ago

I've found a workaround. Adding the argument -x UCX_NET_DEVICES=eth0 to mpirun solves the Connection refused issue and adding --mca coll ^hcoll removes the InfiniBand warnings. The updated jobfile (mpi_isend.job) is:

#!/bin/sh -l
#SBATCH --job-name=pool
#SBATCH --output=pool.out
#SBATCH --nodes=2
#SBATCH --time=600:00
#SBATCH --tasks-per-node=1
#SBATCH --partition=hpc
mpirun --mca coll ^hcoll -x UCX_NET_DEVICES=eth0 ./mpi_isend
anhoward commented 3 years ago

Going to mark this as closed since this is really related to MPI and the platform, not a CycleCloud or Slurm issue.