issues
search
Azure
/
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
MIT License
124
stars
66
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
CycleCloud Slurm Cloud bursting
#754
vinil-v
closed
2 weeks ago
0
Update to pyxis 0.20.0
#753
vanzod
closed
1 month ago
0
Added support for retired pages (check_gpu_ecc)
#752
garvct
closed
1 month ago
0
Updated pyxys/enroot versions
#751
vanzod
closed
4 months ago
0
hpc monitoring (Added support to report node metadata metrics)
#750
garvct
closed
4 months ago
0
Integration of hpc monitoring into AKS
#749
garvct
closed
4 months ago
0
Added GPU VBIOS and GPU throttling tests. (to AKS NPD+DRAINO)
#748
garvct
closed
4 months ago
0
Fix HBV4 topology
#747
marconetto
closed
5 months ago
0
Integrate GPU node health checks into AKS
#746
garvct
closed
5 months ago
0
[WIP] Minimal changes to support migration to use AzNHC for Health Checks
#745
yosoyjay
opened
6 months ago
0
doesn't connect cluster
#744
Soroushmehr1
opened
7 months ago
0
fix code for upsteam change
#743
JingchaoZhang
closed
8 months ago
0
Add support for NDv5 in check_gpu_ecc
#742
vanzod
closed
9 months ago
0
Add support for Ubuntu 22.04 in nhc-run script
#741
vanzod
closed
9 months ago
0
Add support for NDv5 in NHC
#740
vanzod
closed
9 months ago
0
Improve enroot installation
#739
vanzod
closed
10 months ago
1
Install pyxis and enroot from local packages
#738
vanzod
closed
11 months ago
0
Update Pyxis/Enroot CC project
#737
vanzod
closed
11 months ago
1
aks_ndv5 (Run NHC and hpc-diagnostics on AKS+NDv5)
#736
garvct
closed
11 months ago
0
Aks ndv4 update
#735
garvct
closed
1 year ago
1
App pinning tool (Support NDv5)
#734
garvct
closed
1 year ago
0
Run hpc-diagnostics and NHC in AKS (Aks ndv4)
#733
garvct
closed
1 year ago
2
NHC - Corrected floats comparison in NCCL loopback test
#732
vanzod
closed
1 year ago
0
Bump word-wrap from 1.2.3 to 1.2.4 in /azurehpc-ui
#731
dependabot[bot]
opened
1 year ago
0
run_nccl_tests_ncv4 (Remove NCCL_GRAPH_FILE)
#730
garvct
closed
1 year ago
0
Add output messages in install.sh
#729
marconetto
opened
1 year ago
0
Update README.md
#728
marconetto
opened
1 year ago
0
App_pinning_tool (Updated HBv2 numa/l3cache topology info)
#727
garvct
closed
1 year ago
0
Bump semver from 6.3.0 to 6.3.1 in /azurehpc-ui
#726
dependabot[bot]
opened
1 year ago
0
Bump tough-cookie from 4.1.2 to 4.1.3 in /azurehpc-ui
#725
dependabot[bot]
opened
1 year ago
0
Example scripts for NCCL collective tests on NC_A100_V4
#724
garvct
closed
1 year ago
0
CC slurm nhc (Support NC_A100_v4)
#723
garvct
closed
1 year ago
0
App pinning tool (added support for HBv4 and HX)
#722
garvct
closed
1 year ago
0
Minor edit to readme.
#721
garvct
closed
1 year ago
0
Correct formatting issue with readme (LSF section)
#720
garvct
closed
1 year ago
0
Fixing case where pinning is a list of ranges
#719
edwardsp
closed
1 year ago
0
App pinning tool (Add LSF support)
#718
garvct
closed
1 year ago
1
[bug] `azhpc-build` script fails but resources are created, leading to unintended charges
#717
negin513
opened
1 year ago
0
App pinning tool ( slurm/srun integration)
#716
garvct
closed
1 year ago
0
App pinning tool (Support using tool directly in mpi run script)
#715
garvct
closed
1 year ago
0
Add Xid code check for RPC from GSP timeout error
#714
vanzod
closed
1 year ago
0
HPC monitor (add df (filesystem usage and inode) metrics
#713
garvct
closed
1 year ago
0
[bug]: Unable to locate a modulefile for 'spack/spack' in `build-wrf.sh`and `build_wps.sh`
#712
negin513
opened
1 year ago
1
[bug] "Error with `azhpc-scp` command in `apps/wrf/readme.md` : -r flag unrecognized"
#711
negin513
opened
1 year ago
0
HPC monitor (Added support for filesystem inode monitoring)
#710
garvct
closed
1 year ago
0
Bump webpack from 5.75.0 to 5.76.1 in /azurehpc-ui
#709
dependabot[bot]
closed
1 year ago
0
Fix Cc slurm nhc autoscaling bug
#708
garvct
closed
1 year ago
0
Added RAID health check test
#707
vanzod
closed
1 year ago
0
cc_slurm_nhc (Corrected pid extraction in wait_for_nhc.sh)
#706
garvct
closed
1 year ago
0
Deploy cycle slurm ndv4 (replace gpu_monitoring with hpc/ai cluster monitoring)
#705
garvct
closed
1 year ago
0
Next