This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Removed Slurm pinning, we are using torch.distributed.launch with slurm (srun) starting 1 process per node. The slurm pinning is for 8 slurm tasks and so is unoptimal for a single slurm (srun) process.
Removed Slurm pinning, we are using torch.distributed.launch with slurm (srun) starting 1 process per node. The slurm pinning is for 8 slurm tasks and so is unoptimal for a single slurm (srun) process.