ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
116 stars 44 forks source link

More time is spent in user mode when rocprofiler is used with MPI. #79

Open arfio opened 2 years ago

arfio commented 2 years ago

When running an MPI program with rocprof the user time is 39% less than without it. When looking at the Linux kernel trace with LTTng tracer, we can see that the main process for each rank is waiting half the time when rocprof is enabled and it is in running mode without it. When synchronizing the linux kernel trace with the rocprof trace we can see, that this happens with the memory transfer calls.

In the images, blue indicates that the thread is in kernel mode, green, user mode and a yellow line means that the thread is waiting.

withrocprof tracewithoutrocprof

kikimych commented 2 years ago

Can't reproduce with relatively large kernel. Rocprofiler submits additional packets to hsa_queue forcing sched_yield(). There is no additional switches to kernel space.