issues
search
Azure
/
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
MIT License
122
stars
65
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
HPC/AI cluster monitoring (Support scheduled events)
#703
garvct
closed
1 year ago
0
HPC/AI cluster monitoring
#702
garvct
closed
1 year ago
0
cc_slurm_nhc (Added check for exclusive node in run_nhc.sh epilog)
#701
garvct
closed
1 year ago
0
Cc slurm nhc
#700
garvct
closed
1 year ago
0
Modified NDv4/NDmv4 configuration for UbuntuHPC 20.04
#699
vanzod
closed
1 year ago
0
Bump json5 from 1.0.1 to 1.0.2 in /azurehpc-ui
#698
dependabot[bot]
closed
1 year ago
0
Bump express from 4.17.1 to 4.18.2 in /azurehpc-ui
#697
dependabot[bot]
closed
1 year ago
1
Bump qs and express in /azurehpc-ui
#696
dependabot[bot]
closed
1 year ago
1
Bump decode-uri-component from 0.2.0 to 0.2.2 in /azurehpc-ui
#695
dependabot[bot]
closed
1 year ago
1
Bump loader-utils and react-scripts in /azurehpc-ui
#694
dependabot[bot]
closed
1 year ago
0
Nsg multiple sourceip and rocky image plan
#693
hmeiland
closed
1 year ago
1
spack 0.18.1 and get version dynamically
#692
xpillons
closed
1 year ago
0
Added netstat check
#691
vanzod
closed
2 years ago
0
Add test repeats in NCCL allreduce and NCCL allreduce loopback
#690
vanzod
closed
2 years ago
0
WRF detailed setup procedure
#689
marcusgaspar
opened
2 years ago
0
added scripts to run fairseq with AML
#688
JingchaoZhang
closed
1 year ago
0
App pinning tool (Added printing Numa mask)
#687
garvct
closed
1 year ago
0
GPU Monitoring (Update readme, add disk_io image.)
#686
garvct
closed
2 years ago
0
Added timestamp for epilog NHC execution
#685
vanzod
closed
2 years ago
0
GPU Monitoring (Support disk device I/O metrics)
#684
garvct
closed
2 years ago
0
fairseq_moe_docker_slurm (Remove Slurm pinning)
#683
garvct
closed
2 years ago
1
GPU monitoring (Removed unnecessary code (and with bug))
#682
garvct
closed
2 years ago
0
Add ndv4 slurm cc vpn
#681
yosoyjay
closed
1 year ago
3
cc_slurm_nhc (Add autoscaling support)
#680
garvct
closed
2 years ago
0
GPU Monitoring (Updated cpu monitoring and readme)
#679
garvct
closed
2 years ago
0
cc slurm nhc (prolog.sh path error)
#678
garvct
closed
2 years ago
0
cc_slurm_nhc (Need to run check_app_gpu_clocks before check_cuda_bw 24.0 3)
#677
garvct
closed
2 years ago
0
Update readme.md
#676
JingchaoZhang
closed
2 years ago
0
deploy_cycle_slurm_ndv4 (Need to deploy log analytics first, if gpu_monitoring is enabled.)
#675
garvct
closed
2 years ago
0
Update readme.md
#674
JingchaoZhang
closed
2 years ago
0
Update config_pyxis_enroot_sacct_gpu_monitoring.json
#673
JingchaoZhang
closed
2 years ago
0
Deploy_cycle_slurm_ndv4 (Corrected mariaDB config variable)
#672
garvct
closed
2 years ago
0
GPU Monitoring (Added CPU metrics)
#671
garvct
closed
2 years ago
0
Gpu monitoring (Added log analytics examples (and kusto queries))
#670
garvct
closed
2 years ago
0
added enroot+pyxis support
#669
JingchaoZhang
closed
2 years ago
1
Deploy_cycle_slurm_nhv4 (Added support for GPU monitoring (via Azure log analytics))
#668
garvct
closed
2 years ago
0
bugfix in install_nhc
#667
vgamayunov
closed
2 years ago
0
gpu_monitoring (Bug fixes)
#666
garvct
closed
2 years ago
0
Implement idempotency in MariaDB creation script
#665
vanzod
closed
2 years ago
0
Add ability to use variables in config dictionary keys
#664
yosoyjay
closed
2 years ago
3
Merge key vault secret management scripts
#663
vanzod
closed
2 years ago
0
Store private SSH key in Key Vault for bastion-jumpbox
#662
vanzod
closed
2 years ago
0
Deploy_cycle_slurm_ndv4(Set sunrpc kernel parameter tcp_max_slot_table_entries=128)
#661
garvct
closed
2 years ago
0
Disable Ubuntu unattended upgrades
#660
vanzod
closed
2 years ago
0
fix some chmod security issues
#659
garvct
closed
2 years ago
0
Add peering gateway opts
#658
yosoyjay
closed
2 years ago
8
Update Bastion with no public IP Linux and Windows VMs
#657
vanzod
closed
2 years ago
0
cc_slurm_nhc (Added check for NCCL all-reduce out of bounds error)
#656
garvct
closed
2 years ago
0
Deploy_cycle_slurm_ndv4 (support Slurm accounting via MariaDB)
#655
garvct
closed
2 years ago
1
Update config_pyxis_enroot.json
#654
JingchaoZhang
closed
2 years ago
0
Previous
Next