ROCm / rccl

ROCm Communication Collectives Library (RCCL)
https://rocmdocs.amd.com/projects/rccl/en/latest/
Other
249 stars 113 forks source link

Compute time in the reduction operation #1267

Closed tks2004 closed 1 month ago

tks2004 commented 1 month ago

Problem Description

Hi, We are in need to time the compute of the reduction operation, Is there any environment variable or any process to get the time of the compute in the allreduce collective

Operating System

Ubuntu

CPU

EPYC77603

GPU

AMD Instinct MI250X

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

gilbertlee-amd commented 1 month ago

No - no such environment variable exists.
Aside from NPKit, which I suggested previously, you could get the overall time for the AllReduce by wrapping it between two hipEvents, calling hipEventSynchronize, then getting the elapsed time using hipEventElapsedTime.

tks2004 commented 1 month ago

No - no such environment variable exists. Aside from NPKit, which I suggested previously, you could get the overall time for the AllReduce by wrapping it between two hipEvents, calling hipEventSynchronize, then getting the elapsed time using hipEventElapsedTime.

Ok. Thanks