everpeace / kube-openmpi

Open MPI jobs on Kubernetes
Apache License 2.0
112 stars 25 forks source link

DNS resolve issue of workers #26

Open rajeshm9 opened 5 years ago

rajeshm9 commented 5 years ago

Unable to execute mpiexec because of failure of name resolution of workers dns.Kindly guide me what is wrong and what need to checked

kubectl -n mpi get po NAME READY STATUS RESTARTS AGE mpic-master 2/2 Running 0 14m mpic-worker-0 1/1 Running 0 14m mpic-worker-1 1/1 Running 0 14m mpic-worker-2 1/1 Running 0 14m

kubectl -n $KUBE_NAMESPACE exec -it $MPI_CLUSTER_NAME-master /bin/bash root@mpic-master:/# cat /kube-openmpi/generated/hostfile mpic-master.mpic mpic-worker-0 mpic-worker-1 mpic-worker-2

mpirun --allow-run-as-root --display-map -n 1 -npernode 1 --hostfile /kube-openmpi/generated/hostfile -- hostname

ssh: Could not resolve hostname mpic-worker-0: Temporary failure in name resolution

ORTE was unable to reliably start one or more daemons. This usually is caused by:

rajeshm9 commented 5 years ago

Above issue is not related to openmpi. It was related to dns issue to kubenetes setup and solved by upgrading to 1.13.

Thanks