Problem running multi-node MPI jobs #13

Open cwsmith opened 1 year ago

cwsmith commented 1 year ago


I'm hitting problems running MPI jobs that require more than one node using the system install of OpenMPI 4.1.1 in Rocky 8.6. Specifically, the following script runs on a single 2 core m3.small node with sbatch -n 2 -t 5 ./

#!/bin/bash -ex
srun hostname > $hosts
mpirun -n ${SLURM_NPROCS} -hostfile $hosts ./helloWorld
echo "done"

but fails when using four cores sbatch -n 4 -t 5 ./ with the following message:

$ cat slurm-55.out 
+ '[' -z '' ']'
+ case "$-" in
+ __lmod_vx=x
+ '[' -n x ']'
+ set +x
Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/opt/ohpc/admin/lmod/lmod/init/bash)
Shell debugging restarted
+ unset __lmod_vx
+ hosts=hostfile.55
+ srun hostname
+ mpirun -n 4 -hostfile hostfile.55 ./helloWorld
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.

This attempted connection will be ignored; your MPI job may or may not
continue properly.

  Local host: gkeyll-vc-test00-compute-2
  PID:        15413
[gkeyll-vc-test00-compute-1.novalocal:15657] 3 more processes have sent help message help-mpi-btl-tcp.txt / server accept cannot find guid
[gkeyll-vc-test00-compute-1.novalocal:15657] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

The source for the helloWorld binary is:


int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);
  int worldSize, rank;
  MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  int local = 1;
  int global;
  MPI_Allreduce(&local, &global, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
  fprintf(stderr, "%d\n", rank);
  return (global!=worldSize);

and was compiled with mpicxx -o helloWorld.

I also tried running with srun <binary> <args> but it appears that openmpi was not built with slurm/pmi support.

A quick google search on the error message led me to this discussion:

cwsmith commented 1 year ago

Given the error server accept cannot find guid and the comment about IP addresses needing to be unique here: I took a look at the ip addresses of the compute nodes and saw that for the virbr0 interface they are the same.

[exouser@gkeyll-vc-test00-compute-0 ~]$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:3a:3e:2d brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet brd scope global dynamic noprefixroute eth0
       valid_lft 86172sec preferred_lft 86172sec
    inet6 fe80::f816:3eff:fe3a:3e2d/64 scope link 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:79:d2:b8 brd ff:ff:ff:ff:ff:ff
    inet brd scope global virbr0
       valid_lft forever preferred_lft forever
[exouser@gkeyll-vc-test00-compute-1 ~]$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:4b:b2:67 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet brd scope global dynamic noprefixroute eth0
       valid_lft 86150sec preferred_lft 86150sec
    inet6 fe80::f816:3eff:fe4b:b267/64 scope link 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:79:d2:b8 brd ff:ff:ff:ff:ff:ff
    inet brd scope global virbr0
       valid_lft forever preferred_lft forever

To tell OpenMPI to use the eth0 interface I passed the --mca btl_tcp_if_include eth0 flag to mpirun and the job ran successfully. At this point I'm not terribly concerned about performance, but if there is a way to have a faster network I'd like to use it (assuming that this is not using IB/libfabric/UCX).

cwsmith commented 1 year ago
