-
Related: https://github.com/kubeflow/training-operator/issues/2170
As part of this KEP, we will migrate to the MPI V2 implementation.
We should add support for the MPI Runtime.
/area runtime
…
-
I was trying to run Malamute in HPC cluster with 24 cores. The program runs smoothly in series with the command;
`malamute-opt -i dcs5_5_mm_constant_properties.i >& log.out`
For MPI, the executi…
-
**Environment:**
1. **Framework**: TensorFlow, PyTorch
2. **Framework version**:
- TensorFlow: 2.18.0
- PyTorch: 2.4.1
3. **Horovod version**: Attempting to install latest via pip
4. **MPI…
-
Recently, I updated from v2 to v3 maker for a new annotation project. I compiled maker v3 using the same MPICH module I used previously for maker v2.
module load mpich/ge/gcc/64/3.3.2
However, n…
-
Hello,
I have the issue where ascent hangs my simulation when running with MPI on multiple cluster nodes.
I compile with:
`env enable_mpi=ON ./build_ascent.sh`
Did this ever happened before?…
-
I installed `pyamrex` from Conda and I think it doesn't use MPI, even if it's installed on my system. The documentation only mentions that pyamrex conda package does not yet provide GPU support, but d…
-
### System Info
- CPU: x86_64, Intel(R) Xeon(R) Platinum 8470
- CPU/Host memory size: 1TB
- GPU:
4xH100 96GB
- Libraries
TensorRT-LLM: main, 0.15.0 (commit: b7868dd1bd1186840e3755b97ea3d3a73dd…
-
I'm using Ubuntu 22, with a fresh conda environment.
When installing POSEIDON, I ran into an issue with mpi4py where it was unable to find the necessary binaries. Despite mpich being displayed as i…
-
# Problem
As reported in [1], MPI 4.1 Example 6.36 contains the incorrect statement:
It is not possible to perform a blocking collective operation on all communicators because there exists no de…
-
## Overview
As an experience programmer, but entirely new to the MPL/MPI systems and programming styles, a student should be provided with introductory information and helpful examples. Some of the k…