ROCm / rccl

ROCm Communication Collectives Library (RCCL)
https://rocmdocs.amd.com/projects/rccl/en/latest/
Other
236 stars 107 forks source link

[Issue]: RCCL collective call Alltoall is performing way worse than normal MPI Alltoall on Frontier. #1206

Open manver-iitk opened 1 month ago

manver-iitk commented 1 month ago

Problem Description

I ran my code on Frontier for scaling on AMD GPUS. It scaled fine with MPI . But as soon as i replace the MPI_Alltoall call with nccl_Alltoall, it is behaving way worse than MPI. why??

Operating System

SLES (Frontier)

CPU

AMD EPYC 7763 64-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 5.7.1

ROCm Component

rccl

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

edgargabriel commented 2 weeks ago

@manver-iitk a couple of questions: