ROCm / rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
https://rocm.docs.amd.com/projects/rocprofiler/en/latest/
Other
115 stars 43 forks source link

Question about rocprofv2 and bug #115

Open jordialcaraz opened 1 year ago

jordialcaraz commented 1 year ago

I'm using rocprofv2 and I have some questions because I have information that is missing.

In the case of rocprofiler_record_tracer_t, if I have one record from ACTIVITY_DOMAIN_ROCTX, how can i know the type of roctx call?

With the old roctracer, I could check the cid and see if it was either ROCTX_API_ID_roctxRangePushA, ROCTX_API_ID_roctxRangePop, ROCTX_API_ID_roctxRangeStartA, ROCTX_API_ID_roctxRangeStop or ROCTX_API_ID_roctxMarkA. With this, I could know the execution time in a range, but with rocprofv2 I get an starting timestamp and a roctx message (in the case of roctxRangeStart and roctxRangeStop I could match their ID, but not in the cae of RangePush/Pop). Example: Record [847], Domain(ROCTX_DOMAIN), Begin(2992693635292205), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(before hipLaunchKernel) Record [848], Domain(ROCTX_DOMAIN), Begin(2992693635310973), End(0), Correlation ID( 0), ROCTX ID(95), ROCTX Message(hipLaunchKernel range) Record [849], Domain(ROCTX_DOMAIN), Begin(2992693635335155), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(hipLaunchKernel) Record [850], Domain(ROCTX_DOMAIN), Begin(2992693635378544), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(after hipLaunchKernel) Record [851], Domain(ROCTX_DOMAIN), Begin(2992693635406627), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(hipMemcpy) Record [853], Domain(ROCTX_DOMAIN), Begin(2992693635953825), End(0), Correlation ID( 0), ROCTX ID(0) Record [854], Domain(ROCTX_DOMAIN), Begin(2992693635968879), End(0), Correlation ID( 0), ROCTX ID(0) Record [855], Domain(ROCTX_DOMAIN), Begin(2992693635975067), End(0), Correlation ID( 0), ROCTX ID(95)

Also, is it possible to obtain the operation id https://github.com/ROCm-Developer-Tools/rocprofiler/blob/f914c8a819690d581ca9a2a9b463e676fdb47d81/src/core/session/tracer/src/roctracer.h#LL370C1-L385C3 in the case of ACTIVITY_DOMAIN_HIP_OPS and ACTIVITY_DOMAIN_HSA_OPS?

To end it, there is a bug with rocprofiler_record_tracer_t as some fields always have 0. Example: dispatch[1], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032645094954), end_time(2994032645502474) dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378048), kernel-name("mul"), start_time(2994032645535594), end_time(2994032645936554) dispatch[3], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378304), kernel-name("add"), start_time(2994032645962474), end_time(2994032647042317) dispatch[4], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378560), kernel-name("triad"), start_time(2994032647066477), end_time(2994032647744878) dispatch[5], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032647769678), end_time(2994032648169999) dispatch[6], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378048), kernel-name("mul"), start_time(2994032648195439), end_time(2994032648595920) dispatch[7], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378304), kernel-name("add"), start_time(2994032648621360), end_time(2994032649698482) dispatch[8], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378560), kernel-name("triad"), start_time(2994032649721362), end_time(2994032650343763) dispatch[9], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032650367123), end_time(2994032650767284)

With rocprof, the output is: Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj 0,"void copy(double const, double) [clone .kd]",2,0,0,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900 1,"void mul(double, double const) [clone .kd]",2,0,2,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00 2,"void add(double const, double const, double const, double const, double) [clone .kd]",2,0,4,1622800,1622800,26214400,1024,0,0,12,12,32,64,0x0,0x7f397853db00 3,"void triad(double, double const, double const) [clone .kd]",2,0,6,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853dc00 4,"void copy(double const, double) [clone .kd]",2,0,8,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900 5,"void mul(double, double const) [clone .kd]",2,0,10,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00 6,"void add(double const, double const, double const, double const, double) [clone .kd]",2,0,12,1622800,1622800,26214400,1024,0,0,12,12,32,64,0x0,0x7f397853db00 7,"void triad(double, double const, double const) [clone .kd]",2,0,14,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853dc00 8,"void copy(double const, double) [clone .kd]",2,0,16,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900 9,"void mul(double, double const) [clone .kd]",2,0,18,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00

vlaindic commented 11 months ago

Hi @jordialcaraz,

Thank you very much for the issue and I apologize for the delay.

Unfortunately, I do not have the exact answer for the first two questions. However, I can tell that I informed my colleagues about your requests and we will probably address them in some future ROCm release. I would like to add that the rocprofiler V2 is under the development, so you might expect some changes in the API that will not be backward compatible with the version you referenced in this issue.

On the other hand, I can say that the bug you reported is fixed in the ROCm 5.6.0. I run both rocprof V1 and rocprof V2 against simple matrix-multiplication app, and the outputs match:

# The output of the rocprofv2 5.6.0
dispatch[1], gpu_id(3), queue_id(1), queue_index(1), pid(1246354), tid(1246354), grd(256), wgr(256), lds(0), scr(0), arch_vgpr(12), accum_vgpr(4), sgpr(16), wave_size(64), sig(139770543371008), obj(1), kernel-name("__amd_rocclr_initHeap.kd"), start_time(3079692919307131), end_time(3079692919314491)
dispatch[2], gpu_id(3), queue_id(1), queue_index(2), pid(1246354), tid(1246354), grd(16777216), wgr(256), lds(0), scr(100), arch_vgpr(32), accum_vgpr(56), sgpr(48), wave_size(64), sig(139770543370752), obj(2), kernel-name("matmul"), start_time(3079692919654394), end_time(3079694274635785)
#The output of the rocprof 5.6.0
Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj
0,"__amd_rocclr_initHeap.kd",5,0,1,1250810,1250810,256,256,0,0,12,4,16,64,0x0,0x7f34f56adc40
1,"matmul(float*, float*, float*, int) [clone .kd]",5,0,2,1250810,1250810,16777216,256,0,100,32,56,48,64,0x0,0x7f34f5678940

Best regards, Vladimir