Closed jordialcaraz closed 3 months ago
Hi @jordialcaraz,
Thank you very much for the issue and I apologize for the delay.
Unfortunately, I do not have the exact answer for the first two questions. However, I can tell that I informed my colleagues about your requests and we will probably address them in some future ROCm release. I would like to add that the rocprofiler V2 is under the development, so you might expect some changes in the API that will not be backward compatible with the version you referenced in this issue.
On the other hand, I can say that the bug you reported is fixed in the ROCm 5.6.0. I run both rocprof V1 and rocprof V2 against simple matrix-multiplication app, and the outputs match:
# The output of the rocprofv2 5.6.0
dispatch[1], gpu_id(3), queue_id(1), queue_index(1), pid(1246354), tid(1246354), grd(256), wgr(256), lds(0), scr(0), arch_vgpr(12), accum_vgpr(4), sgpr(16), wave_size(64), sig(139770543371008), obj(1), kernel-name("__amd_rocclr_initHeap.kd"), start_time(3079692919307131), end_time(3079692919314491)
dispatch[2], gpu_id(3), queue_id(1), queue_index(2), pid(1246354), tid(1246354), grd(16777216), wgr(256), lds(0), scr(100), arch_vgpr(32), accum_vgpr(56), sgpr(48), wave_size(64), sig(139770543370752), obj(2), kernel-name("matmul"), start_time(3079692919654394), end_time(3079694274635785)
#The output of the rocprof 5.6.0
Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj
0,"__amd_rocclr_initHeap.kd",5,0,1,1250810,1250810,256,256,0,0,12,4,16,64,0x0,0x7f34f56adc40
1,"matmul(float*, float*, float*, int) [clone .kd]",5,0,2,1250810,1250810,16777216,256,0,100,32,56,48,64,0x0,0x7f34f5678940
Best regards, Vladimir
@jordialcaraz Can you please check if your issue still exists with the latest ROCm 6.2? If resolved, please close the ticket. Thanks!
I'm using rocprofv2 and I have some questions because I have information that is missing.
In the case of
rocprofiler_record_tracer_t
, if I have one record fromACTIVITY_DOMAIN_ROCTX
, how can i know the type of roctx call?With the old roctracer, I could check the
cid
and see if it was eitherROCTX_API_ID_roctxRangePushA
,ROCTX_API_ID_roctxRangePop
,ROCTX_API_ID_roctxRangeStartA
,ROCTX_API_ID_roctxRangeStop
orROCTX_API_ID_roctxMarkA
. With this, I could know the execution time in a range, but with rocprofv2 I get an starting timestamp and a roctx message (in the case of roctxRangeStart and roctxRangeStop I could match their ID, but not in the cae of RangePush/Pop). Example: Record [847], Domain(ROCTX_DOMAIN), Begin(2992693635292205), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(before hipLaunchKernel) Record [848], Domain(ROCTX_DOMAIN), Begin(2992693635310973), End(0), Correlation ID( 0), ROCTX ID(95), ROCTX Message(hipLaunchKernel range) Record [849], Domain(ROCTX_DOMAIN), Begin(2992693635335155), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(hipLaunchKernel) Record [850], Domain(ROCTX_DOMAIN), Begin(2992693635378544), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(after hipLaunchKernel) Record [851], Domain(ROCTX_DOMAIN), Begin(2992693635406627), End(0), Correlation ID( 0), ROCTX ID(0), ROCTX Message(hipMemcpy) Record [853], Domain(ROCTX_DOMAIN), Begin(2992693635953825), End(0), Correlation ID( 0), ROCTX ID(0) Record [854], Domain(ROCTX_DOMAIN), Begin(2992693635968879), End(0), Correlation ID( 0), ROCTX ID(0) Record [855], Domain(ROCTX_DOMAIN), Begin(2992693635975067), End(0), Correlation ID( 0), ROCTX ID(95)Also, is it possible to obtain the operation id https://github.com/ROCm-Developer-Tools/rocprofiler/blob/f914c8a819690d581ca9a2a9b463e676fdb47d81/src/core/session/tracer/src/roctracer.h#LL370C1-L385C3 in the case of ACTIVITY_DOMAIN_HIP_OPS and ACTIVITY_DOMAIN_HSA_OPS?
To end it, there is a bug with
rocprofiler_record_tracer_t
as some fields always have 0. Example: dispatch[1], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032645094954), end_time(2994032645502474) dispatch[2], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378048), kernel-name("mul"), start_time(2994032645535594), end_time(2994032645936554) dispatch[3], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378304), kernel-name("add"), start_time(2994032645962474), end_time(2994032647042317) dispatch[4], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378560), kernel-name("triad"), start_time(2994032647066477), end_time(2994032647744878) dispatch[5], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032647769678), end_time(2994032648169999) dispatch[6], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378048), kernel-name("mul"), start_time(2994032648195439), end_time(2994032648595920) dispatch[7], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378304), kernel-name("add"), start_time(2994032648621360), end_time(2994032649698482) dispatch[8], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165378560), kernel-name("triad"), start_time(2994032649721362), end_time(2994032650343763) dispatch[9], gpu_id(0), queue_id(1), queue_index(0), pid(1621817), tid(0), grd(0), wgr(0), lds(0), scr(0), arch_vgpr(0), accum_vgpr(0), sgpr(0), wave_size(0), sig(0), obj(140253165377792), kernel-name("copy"), start_time(2994032650367123), end_time(2994032650767284)With rocprof, the output is: Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj 0,"void copy(double const, double) [clone .kd]",2,0,0,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900
1,"void mul(double, double const) [clone .kd]",2,0,2,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00
2,"void add(double const, double const, double const, double const, double) [clone .kd]",2,0,4,1622800,1622800,26214400,1024,0,0,12,12,32,64,0x0,0x7f397853db00
3,"void triad(double , double const, double const) [clone .kd]",2,0,6,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853dc00
4,"void copy(double const, double) [clone .kd]",2,0,8,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900
5,"void mul(double, double const) [clone .kd]",2,0,10,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00
6,"void add(double const, double const, double const, double const, double) [clone .kd]",2,0,12,1622800,1622800,26214400,1024,0,0,12,12,32,64,0x0,0x7f397853db00
7,"void triad(double , double const, double const) [clone .kd]",2,0,14,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853dc00
8,"void copy(double const, double) [clone .kd]",2,0,16,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853d900
9,"void mul(double, double const) [clone .kd]",2,0,18,1622800,1622800,26214400,1024,0,0,8,8,16,64,0x0,0x7f397853da00