This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Remove NCCL_GRAPH_FILE environmental variable from scripts
Setting NCCL_GRAPH_FILE causes NCCL to fail on NC96ads_A100_v4 when run on < 4 GPU's. If also causes NCCL tests to run significantly slower on one NC48ads_A100_v4 (without NCCL_GRAPH_FILE set ~220 GB/s, with NCCL_GRAPH_FILE set, ~58 GB/s)
Setting NCCL_GRAPH_FILE causes NCCL to fail on NC96ads_A100_v4 when run on < 4 GPU's. If also causes NCCL tests to run significantly slower on one NC48ads_A100_v4 (without NCCL_GRAPH_FILE set ~220 GB/s, with NCCL_GRAPH_FILE set, ~58 GB/s)