This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
At times check_ib_bw_gdr fails with the following error:
Couldn't listen to port 18515
Unable to open file descriptor for socket connection Unable to init the socket connection
Couldn't connect to slurmcluster-hpc-pg0-277:18515
Unable to open file descriptor for socket connection Unable to init the socket connection
Adding netstat command to provide additional insights on port status.
At times
check_ib_bw_gdr
fails with the following error:Adding
netstat
command to provide additional insights on port status.