Azure / azurehpc

This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
MIT License
121 stars 64 forks source link

Aks ndv4 update #735

Closed garvct closed 9 months ago

garvct commented 10 months ago
JingchaoZhang commented 9 months ago

Two optional updates:

  1. This commit is for ndv4, but it's worth pointing out the current setup works for nd96asr_v4 specifically. If one uses nd96amsr_a100_v4, then the conf/nd96amsr_a100_v4.conf file needs to be updated before the NHC can run.
  2. It's probably a good idea to add a sleep 3600 in the hpc-diagnostics.yaml file. This way people can kubectl cp the tarball directly after the log collection.