ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

Overlapping kernel in profiler trace #127

Closed jinhongyii closed 9 months ago

jinhongyii commented 9 months ago

I'm profiling a 2 process program using rocprofv2, and each process controls one GPU. All kernels except rccl kernels are launched in the same stream, so I expect them to run sequentially. However, the profiler produces a weird result and I have several questions on this.

image

In this screenshot, rms_norm_kernel is launched before fused_fused_decode1_take_kernel, but timeline shows that one is on top of another, which is pretty confusing to me. Does it means they are not running sequentially?

Another confusing point is the "Marker" block in the timeline. I don't have a kernel called "Marker". What does it mean?

I'm using rocm 5.7 on ubuntu 22.04.2.

jrmadsen commented 9 months ago

If you are using rocprofv2 bc of the perfetto visualization, I’d recommend using Omnitrace instead. It will output protobufs just like rocprofv2 but it still uses the standalone roctracer library. In v2, roctracer and rocprofiler were merged into the same library and the former group lead did this very hastily and without adequate testing. The implementation is so problematic, we are rewriting it from scratch and are targeting the release after 5.7 (i.e. we are targeting a release before the end of the year) so until that time, I’d recommend avoiding rocprofiler v2 entirely.