ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
MIT License
126 stars 46 forks source link

Question: what is the recommended way to profile multi-gpu code? #129

Closed tiandi111 closed 1 week ago

tiandi111 commented 1 year ago

I've used rocprofv2 and encountered the same problem stated in this issue.

I'm wondering what is the recommended way to profile multi-gpu code with ROCm-5.6? Also, API form is perfered since I want to control the profiling range.

Omnitrace seems an overkill for me since I only want to trace each gpu kernels.

For example, the output of rocprofv2 in "ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION" mode is exactly what I want:

dispatch[1], gpu_id(0), queue_id(1), queue_index(1881), pid(123346), tid(123510), grd(256), wgr(256), lds(2048), scr(512), arch_vgpr(112), accum_vgpr(64), sgpr(112), wave_size(64), sig(140333513201792), obj(1), kernel-name("ncclKernel_SendRecv_RING_SIMPLE_Sum_int8_t(ncclDevComm*, unsigned long, ncclWork*) [clone .kd]"), start_time(2492860750916911) , end_time(2492860751105392) 
dispatch[2], gpu_id(0), queue_id(4), queue_index(13687), pid(123346), tid(123510), grd(6656), wgr(256), lds(0), scr(0), arch_vgpr(4), accum_vgpr(4), sgpr(16), wave_size(64), sig(140333513201536), obj(2), kernel-name("take_kernel.kd"), start_time(2492860751121072) , end_time(2492860751126192) 

Thanks!

ppanchad-amd commented 1 month ago

@tiandi111 Apologies for the lack of response. Do you still need assistance with this ticket? If not, please close the ticket. Thanks!

ppanchad-amd commented 1 week ago

@tiandi111 Closing ticket for now. Please leave a comment if you still need assistance and I will re-open the ticket. Thanks!