issues
search
Azure
/
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
MIT License
124
stars
66
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update config_pyxis_enroot.json
#654
JingchaoZhang
closed
2 years ago
0
cc slurm nhc (Added single node NCCL all-reduce test)
#653
garvct
closed
2 years ago
0
Deploycycle_slurm_ndv4(Added some useful Slurm customizations)
#652
garvct
closed
2 years ago
0
app_pinning+tool (Added support for new hbv2 numa topology)
#651
garvct
closed
2 years ago
0
check_gpu_ecc tool (Changed ECC error threshold and added a SRAM ECC error threshold)
#650
garvct
closed
2 years ago
0
CC_SLURM_NHC (Added SRAM correctable ECC threshold)
#649
garvct
closed
2 years ago
0
CC_SLURM_NHC (Added check for GPU Xid errors)
#648
garvct
closed
2 years ago
2
GPU Monitoring (Bug fix, correct dictionary key.)
#647
garvct
closed
2 years ago
0
gpu_monitoring (bug fix, syntax alignment and number of arguments to function)
#646
garvct
closed
2 years ago
0
GPU Monitoring (Support Ethernet and NFS I/O metrics)
#645
garvct
closed
2 years ago
0
Fix CycleCLI installation in local host
#644
vanzod
closed
2 years ago
0
GPU Monitoring (Added option to collect IB metrics)
#643
garvct
closed
2 years ago
0
This repo is missing important files
#642
microsoft-github-policy-service[bot]
closed
2 years ago
0
Adding Microsoft SECURITY.MD
#641
microsoft-github-policy-service[bot]
closed
2 years ago
0
Harden scripts sourcing common_function.sh
#640
vanzod
closed
2 years ago
0
CC_SLURM_NHC (DRAIN nodes if ECC error count > 20M)
#639
garvct
closed
2 years ago
0
Deploy cycle slurm ndv4 (Added config file with pyxis+enroot)
#638
garvct
closed
2 years ago
3
Refactored is_slurm_controller function
#637
vanzod
closed
2 years ago
0
cc slurm nhc (bug fix for prolog support)
#636
garvct
closed
2 years ago
0
CC_SLURM_NHC (Support killing NHC via SLURM Prolog)
#635
garvct
closed
2 years ago
0
Support running cyclecloud project scripts in the background
#634
garvct
closed
2 years ago
0
Bump terser from 4.8.0 to 4.8.1 in /azurehpc-ui
#633
dependabot[bot]
closed
2 years ago
0
cc_slurm_pyxis_enroot (SLURM container support via pyxis/enroot (cyclecloud project))
#632
garvct
closed
2 years ago
0
run_nccl_tests_ndv4 (Added slurm+pyxis+enroot NCCL test script)
#631
garvct
closed
2 years ago
0
fix link to subsection of readme
#630
ltalirz
closed
2 years ago
0
[feature] specify subscription through config.json?
#629
ltalirz
opened
2 years ago
0
Iterative check_cuda_bw
#628
vanzod
closed
2 years ago
0
CC_SLURM_NHC (Added SLURM Epilog support)
#627
garvct
closed
2 years ago
0
gpu_monitoring: Script returns error on Ubuntu 20.04 LTS [bug]
#626
dspasicevs
opened
2 years ago
0
CC_SLURM_NHC (Changed cuda_bw test threshold)
#625
garvct
closed
2 years ago
0
CC_SLURM_NHC (Support dropping CPU cached memory.)
#624
garvct
closed
2 years ago
0
add support for rocky plan
#623
hmeiland
closed
2 years ago
0
check_app_pinning tool (Added Mvapich2 support)
#622
garvct
closed
2 years ago
0
CC_SLURM_NHC (change ECC threshold and NHC run interval)
#621
garvct
closed
2 years ago
0
CC_SLURM_NHC (updated ECC error check)
#620
garvct
closed
2 years ago
0
CC_SLURM_NHC( Changed NHC_TIMEOUT=250)
#619
garvct
closed
2 years ago
0
Cyclecloud slurm NDv4 (Modify limits on login node)
#618
garvct
closed
2 years ago
0
Initial version of GPU ECC error checking tool
#617
garvct
closed
2 years ago
1
Update run_ring_osu_bw_hpcx_slurm.sh
#616
JonShelley
closed
2 years ago
0
Fix NHC GPU application clocks script to iterate across all GPUs
#615
vanzod
closed
2 years ago
0
Bump jsdom from 16.4.0 to 16.7.0 in /azurehpc-ui
#614
dependabot[bot]
closed
2 years ago
0
updated run script and added a slurm run script
#613
JonShelley
closed
2 years ago
0
CC_SLURM_NHC (Added support for HBv3)
#612
garvct
closed
2 years ago
0
Updated deploy_cycle_slurm_ndv4
#611
garvct
closed
2 years ago
0
Updated bastion, added properties update and ubuntu version
#610
garvct
closed
2 years ago
0
Azhop update
#609
xpillons
closed
2 years ago
0
Initial version of NDv4+cycle+anf+nhc deployment
#608
garvct
closed
2 years ago
0
CC_SLURM_NHC (Changed NHC defaults)
#607
garvct
closed
2 years ago
0
Bug fix api version peering
#606
garvct
closed
2 years ago
0
[feature] Add the link of this video in the documentation
#605
marcusgaspar
opened
2 years ago
0
Previous
Next