ROCm / omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
https://rocm.docs.amd.com/projects/omnitrace/en/latest/
MIT License
297 stars 27 forks source link

Feature request: Move GPU trace closer to HIP+CPU activity #275

Closed gsitaram closed 1 year ago

gsitaram commented 1 year ago

When we profile on a node with 2 CPUs (with 64 cores each) and 8 GPUs and enable ROCPROFILER, ROCTRACER and ROCM_SMI support, we typically see many lines of CPU performance metrics, one for each core. We then have to scroll all the way to the bottom to find the GPU activity and select that manually to see the HIP activity and the GPU activity close to each other. Is it possible to move the GPU activity closer to the CPU/HIP activity and near the top of the trace?

There are a couple of other questions:

  1. If we have a slurm allocation for 1 GPU, it can be any 1 of say, 8 GPUs on the node. Will Omnitrace still collect metrics on all 8 GPUs? If we set up OMNITRACE_SAMPLING_GPUS = 0, would it collect only on 1 GPU and is that the same GPU that was allocated by slurm?
  2. Is OMNITRACE_SAMPLING_CPUS = 0 the right way to limit collecting CPU metrics on only 1 CPU core to shorten the trace obtained?
jrmadsen commented 1 year ago

AFAICT, the Perfetto GUI organizes those tracks alphabetically but I can look into if there are ways around that.

  1. If I recall correctly, slurm only makes 1 GPU visible so you shouldn't have to set that metric. If that's not the case, you could also try setting it to %env{HIP_VISIBLE_DEVICES}% assuming slurm sets that
  2. Yes, it also accepts "none" if you simply don't want any CPU frequency info.