I have found a very strange behavior in rccl-rocm-6.1.2 that I cannot understand based on my limited knowledge of LL implementation. The behavior is for AllGather - RING - LL test.
In the LL implementation, each channel has 256 threads.
Each thread in each trip, sends/receives 8B data.
So each trip of any primitive of LL transfers 256 threads x 8B = 2KB data.
I found that if the data size is not divisible by 128B (16 threads), the latency is very high.
In the following experiment, I increase the data by 16B in each step, meaning that 2 more threads will transfer data.
Every 8 steps (x 2 threads = 16 threads or 1/4 of warp size), the latency is low.
Otherwise, latency is huge (~150us difference).
Can anyone understand why this is happening?
Problem Description
Hi Everyone,
I have found a very strange behavior in rccl-rocm-6.1.2 that I cannot understand based on my limited knowledge of LL implementation. The behavior is for AllGather - RING - LL test.
In the LL implementation, each channel has 256 threads. Each thread in each trip, sends/receives 8B data. So each trip of any primitive of LL transfers 256 threads x 8B = 2KB data. I found that if the data size is not divisible by 128B (16 threads), the latency is very high.
In the following experiment, I increase the data by 16B in each step, meaning that 2 more threads will transfer data. Every 8 steps (x 2 threads = 16 threads or 1/4 of warp size), the latency is low. Otherwise, latency is huge (~150us difference). Can anyone understand why this is happening?
Sincerely,
Operating System
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
CPU
Intel(R) Xeon(R) Platinum 8480C
GPU
AMD Instinct MI300X
ROCm Version
ROCm 6.1.0
ROCm Component
No response
Steps to Reproduce
Using 3 fully-connected GPUs:
RCCL_MSCCL_ENABLE=0 NCCL_PROTO=LL NCCL_ALGO=RING NCCL_MIN_NRINGS=16 NCCL_MAX_NRINGS=16 LD_LIBRARY_PATH=rccl-rocm-6.1.2/build/release/:$LD_LIBRARY_PATH ./build/all_gather_perf -g 3 -b 50331648 -e 50334720 -i 48 -s 1
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response