-
Hello,
I am trying to run some MPI benchmarks with Sarus containers. In particular I am using OpenMPI 4.
Nodes are RDMA capable and have Infiniband. Everything works fine without the container and …
-
User namespace is now disabled but required for enroot https://github.com/NVIDIA/enroot/blob/master/doc/requirements.md#kernel-settings
This is a breaking change for customers using enroot.
-
run-tests.sh has in the meantime a very complex structure and carrys many variables for the different distros and versions.
As the scripts write a the component versions in a file, this file could …
-
The doc page has a newly added note, which is unclear:
> For Azure NVads A10 v5 VMs we recommend customers to always be on the latest driver version. The latest NVIDIA major driver branch(n) is only…
-
EDIT: I have an issue with the combination of UCX + MPICH but I am unsure if it's on the MPICH side or the UCX side.
Thanks for your help!
### Describe the bug
Depending on the TLS chosen `dc_mlx…
-
## Paste the link of the GitHub organisation below and submit
https://github.com/Azure
---
###### Please subscribe to this thread to get notified when a new repository is created
-
The versions which got installed by the scripts are distributed in many files. This means a lot of work do provide newer versions.
A central file for the version would help to have less work to cha…
-
We meet some random errors when using pytorch's default NCCL with 2.7.8 version, in the DGX A100 cluster.
So we try to upgrade it. We tried versions 2.8.4, 2.9.8 and 2.9.9.
However, We find the ove…
-
The CycleCloud slurm project build slurm with pmix in /opt/pmix/v4 and is expecting it there see the [build script](https://github.com/Azure/cyclecloud-slurm/blob/master/util/build-slurm.sh).
PMIX …
-
I have compiled the tool and have ran it on a multi-gpu system. I get the following...
GPU 0: TITAN RTX (UUID: GPU-f8cb36a5-7a80-8667-e9c1-66c4acddaeaf)
GPU 1: GeForce RTX 2080 Ti (UUID: GPU-54bf7…