ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
340 stars 157 forks source link

Missing functions #1457

Closed waheedi closed 1 month ago

waheedi commented 1 month ago

Hello there,

I have been working on a custom build for the gfx1010 navi10 target which is kind of unsupported target across few Rocm repos, now I know this is not your problem that I'm trying to run things on my own, but I would like to understand few things, after successfully building the stack I'm now running into some missing definitions, mostly optimized functions:

for example this is one of them:

:1:hip_module.cpp           :84  : 6339659646d us:  Cannot find the function: Cijk_Alik_Bljk_HB_MT64x64x8_SN_AMAS0_BL0_BS1_GLVWA1_GLVWB1_GRVW1_GSU1_GSUASB_K1_LRVW1_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUS256_SVW4_TT4_4_USFGRO0_VAW2_VS1_VW1_VWB1_WG16_16_1_WGM8 for module: 0x9d8a0820

I could not know where can these be located except in rocBLAS any hints would be appreciated.

waheedi commented 1 month ago

So far I think we need to add the new gfx1010 caps to Asm then we need to create the Tensile configs for navi10 gfx1010 then also create the blas3/navi10 yaml files, or are these generated from the Tensile configs?

No the navi10 needs to be created :) I think its doing something now

waheedi commented 1 month ago

ok now I missing a different AMAS version I have no idea what is that but it was missing an AMAS0 before and now AMAS3

waheedi commented 1 month ago

ok the strange thing is that these missing functions actually exist here:

grep -rn "Cijk_Ailk_Bljk_SB_MT32x32x8_SN_AMAS0_BL1_BS1_EPS1_GLVWA1_GLVWB1_GRVW1_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW1_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU0_SUM0_SUS0_SVW" /opt/rocm/lib

grep: /opt/rocm/lib/rocblas/library/TensileLibrary_Type_SS_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx1010.co: binary file matches
grep: /opt/rocm/lib/rocblas/library/TensileLibrary_Type_SS_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx1010.dat: binary file matches
NaveenElumalaiAMD commented 1 month ago

Hi @waheedi, thank you for reporting the issue. Could you tell me ROCm and rocBLAS version that you are using?

waheedi commented 1 month ago

Thanks @NaveenElumalaiAMD I think this is not a ROcm version I had installed (so I actually named it version 6.2), its a locally built version, but rocBLAS version is this commit 663aeaeb060222a8d9cb644c8e24531c2ac236b1 and I also had to add the gfx1010 stuff :D

But what is really interesting to me now, if I do AMD_LOG_LEVEL=3 I dont get these errors anymore, but rather I get a successful call, my questions why when I use AMD_LOG_LEVEL=2 I get the below errors:

:1:hip_code_object.cpp      :1006: 13220495444d us:  Cannot find the function: Cijk_Alik_Bljk_HB_MT16x16x16_SN_AMAS3_BL1_BS1_EPS1_GLVWA2_GLVWB2_GRVW2_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW2_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT2_2_USFGROn1_VAW2_VSn1_VW2_VWB2_WS32_WG8_8_1_WGM1 
:1:hip_module.cpp           :84  : 13220495466d us:  Cannot find the function: Cijk_Alik_Bljk_HB_MT16x16x16_SN_AMAS3_BL1_BS1_EPS1_GLVWA2_GLVWB2_GRVW2_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW2_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT2_2_USFGROn1_VAW2_VSn1_VW2_VWB2_WS32_WG8_8_1_WGM1 for module: 0xf049f920

But when using AMD_LOG_LEVEL=3

I get these successful calls:

3:rocvirtual.cpp           :807 : 18487715118d us:  Arg0:  Tensor2dSizeA = val:322688
:3:rocvirtual.cpp           :807 : 18487715123d us:  Arg1:  Tensor2dSizeB = val:128
:3:rocvirtual.cpp           :807 : 18487715129d us:  Arg2:  AddressD = val:138164379332608
:3:rocvirtual.cpp           :807 : 18487715133d us:  Arg3:  AddressC = val:138164379332608
:3:rocvirtual.cpp           :807 : 18487715138d us:  Arg4:  AddressA = val:138165845229568
:3:rocvirtual.cpp           :807 : 18487715142d us:  Arg5:  AddressB = val:138164379320320
:3:rocvirtual.cpp           :807 : 18487715146d us:  Arg6:  Alpha = val:1006648320
:3:rocvirtual.cpp           :807 : 18487715151d us:  Arg7:  Beta = val:0
:3:rocvirtual.cpp           :807 : 18487715155d us:  Arg8:  StridesD = val:274877907008
:3:rocvirtual.cpp           :807 : 18487715160d us:  Arg9:  StridesC = val:274877907008
:3:rocvirtual.cpp           :807 : 18487715165d us:  Arg10:  StridesA = val:549755819008
:3:rocvirtual.cpp           :807 : 18487715170d us:  Arg11:  StridesB = val:549755819008
:3:rocvirtual.cpp           :807 : 18487715174d us:  Arg12:  SizesFree = val:
:3:rocvirtual.cpp           :807 : 18487715179d us:  Arg13:  SizesSum = val:128
:3:rocvirtual.cpp           :807 : 18487715183d us:  Arg14:  OrigStaggerUIter = val:1
:3:rocvirtual.cpp           :807 : 18487715188d us:  Arg15:  NumWorkGroups0 = val:4
:3:rocvirtual.cpp           :807 : 18487715192d us:  Arg16:  NumWorkGroups1 = val:1
:3:rocvirtual.cpp           :3033: 18487715196d us:  ShaderName : Cijk_Alik_Bljk_HB_MT16x16x16_SN_AMAS3_BL1_BS1_EPS1_GLVWA2_GLVWB2_GRVW2_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW2_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT2_2_USFGROn1_VAW2_VSn1_VW2_VWB2_WS32_WG8_8_1_WGM1
:3:hip_module.cpp           :488 : 18487715208d us:  hipExtModuleLaunchKernel: Returned hipSuccess : 
:3:hip_platform.cpp         :225 : 18487715217d us:   __hipPushCallConfiguration ( {10,1,1}, {256,1,1}, 0, stream:0x601ce97fd380 ) 
:3:hip_platform.cpp         :229 : 18487715223d us:  __hipPushCallConfiguration: Returned hipSuccess : 
:3:hip_platform.cpp         :234 : 18487715230d us:   __hipPopCallConfiguration ( {3871342592,32168,150}, {5120,0,128}, 0x7ffc6eb2a4d0, 0x7ffc6eb2a4c8 ) 
:3:hip_platform.cpp         :243 : 18487715236d us:  __hipPopCallConfiguration: Returned hipSuccess : 
:3:hip_module.cpp           :685 : 18487715243d us:   hipLaunchKernel ( 0x7dabce6d1bd0, {10,1,1}, {256,1,1}, 0x7ffc6eb2a510, 0, stream:0x601ce97fd380 ) 
:3:rocvirtual.cpp           :731 : 18487715252d us:  Arg0:   = ptr:0x7da8e6c03000 obj:[0x7da8e6c03000-0x7da8e6c08400]
:3:rocvirtual.cpp           :731 : 18487715256d us:  Arg1:   = ptr:0x7da901002000 obj:[0x7da8fba00000-0x7da90ba02000]
:3:rocvirtual.cpp           :807 : 18487715261d us:  Arg2:   = val:2560
:3:rocvirtual.cpp           :3033: 18487715265d us:  ShaderName : _ZL13convert_unaryI6__halffEvPKvPT0_l
:3:hip_module.cpp           :686 : 18487715272d us:  hipLaunchKernel: Returned hipSuccess : 
:3:hip_error.cpp            :36  : 18487715279d us:   hipGetLastError (  ) 
:3:hip_device_runtime.cpp   :634 : 18487715285d us:   hipGetDevice ( 0x7ffc6eb2a6c4 ) 
:3:hip_device_runtime.cpp   :642 : 18487715289d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_platform.cpp         :225 : 18487715295d us:   __hipPushCallConfiguration ( {40,1,1}, {64,1,1}, 384, stream:0x601ce97fd380 ) 
:3:hip_platform.cpp         :229 : 18487715302d us:  __hipPushCallConfiguration: Returned hipSuccess : 
:3:hip_platform.cpp         :234 : 18487715308d us:   __hipPopCallConfiguration ( {0,0,3918055488}, {20480,0,20480}, 0x7ffc6eb2a708, 0x7ffc6eb2a700 ) 
:3:hip_platform.cpp         :243 : 18487715315d us:  __hipPopCallConfiguration: Returned hipSuccess : 
:3:hip_module.cpp           :685 : 18487715321d us:   hipLaunchKernel ( 0x7dabce6d2420, {40,1,1}, {64,1,1}, 0x7ffc6eb2a750, 384, stream:0x601ce97fd380 ) 

The shader function is the same for both: Cijk_Alik_Bljk_HB_MT16x16x16_SN_AMAS3_BL1_BS1_EPS1_GLVWA2_GLVWB2_GRVW2_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW2_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT2_2_USFGROn1_VAW2_VSn1_VW2_VWB2_WS32_WG8_8_1_WGM1

Does the AMD_LOG_LEVEL control other env variables behind the scene so it may cause the DEBUG variable to be set to true and thus my call in DEBUG mode is legitimate?

NaveenElumalaiAMD commented 1 month ago

Could you provide an example code with the error that you are seeing so that I could reproduce it from my end?

waheedi commented 1 month ago

Yes its actually several examples that raises these errors, a simple python example would be this:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.to("cuda")
images = pipe("a hip cat with a hat") # This line triggers the shaders functions to be called

The above code results in these errors if its run with AMD_LOG_LEVEL=2


:1:hip_code_object.cpp      :1006: 25341613187d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x64x8_SN_AMAS3_BL1_BS1_EPS1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM4 
:1:hip_module.cpp           :84  : 25341613192d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x64x8_SN_AMAS3_BL1_BS1_EPS1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM4 for module: 0xa2c4510

and when called with AMD_LOG_LEVEL=3 I dont see these errors anymore but i see successful calls.

3:rocvirtual.cpp           :807 : 24888422484d us:  Arg0:  Tensor2dSizeA = val:1966080
:3:rocvirtual.cpp           :807 : 24888422489d us:  Arg1:  Tensor2dSizeB = val:1228800
:3:rocvirtual.cpp           :807 : 24888422494d us:  Arg2:  AddressD = val:135806874288128
:3:rocvirtual.cpp           :807 : 24888422498d us:  Arg3:  AddressC = val:135806874288128
:3:rocvirtual.cpp           :807 : 24888422503d us:  Arg4:  AddressA = val:135806805082112
:3:rocvirtual.cpp           :807 : 24888422507d us:  Arg5:  AddressB = val:135807622971392
:3:rocvirtual.cpp           :807 : 24888422513d us:  Arg6:  Alpha = val:1065353216
:3:rocvirtual.cpp           :807 : 24888422517d us:  Arg7:  Beta = val:0
:3:rocvirtual.cpp           :807 : 24888422522d us:  Arg8:  StridesD = val:2814749767107584
:3:rocvirtual.cpp           :807 : 24888422526d us:  Arg9:  StridesC = val:2814749767107584
:3:rocvirtual.cpp           :807 : 24888422531d us:  Arg10:  StridesA = val:8444249301320704
:3:rocvirtual.cpp           :807 : 24888422535d us:  Arg11:  StridesB = val:1920
:3:rocvirtual.cpp           :807 : 24888422540d us:  Arg12:  SizesFree = val:
:3:rocvirtual.cpp           :807 : 24888422544d us:  Arg13:  SizesSum = val:1920
:3:rocvirtual.cpp           :807 : 24888422549d us:  Arg14:  OrigStaggerUIter = val:31
:3:rocvirtual.cpp           :807 : 24888422553d us:  Arg15:  NumWorkGroups0 = val:8
:3:rocvirtual.cpp           :807 : 24888422559d us:  Arg16:  NumWorkGroups1 = val:10
:3:rocvirtual.cpp           :807 : 24888422563d us:  Arg17:  NumFullBlocks = val:2
:3:rocvirtual.cpp           :807 : 24888422569d us:  Arg18:  WgmRemainder1 = val:2
:3:rocvirtual.cpp           :807 : 24888422573d us:  Arg19:  MagicNumberWgmRemainder1 = val:1073741825
:3:rocvirtual.cpp           :3033: 24888422578d us:  ShaderName : Cijk_Ailk_Bljk_SB_MT128x64x8_SN_AMAS3_BL1_BS1_EPS1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM4
:3:hip_module.cpp           :488 : 24888422589d us:  hipExtModuleLaunchKernel: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :634 : 24888422607d us:   hipGetDevice ( 0x7ffd31f12e14 ) 
:3:hip_device_runtime.cpp   :642 : 24888422612d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :634 : 24888422620d us:   hipGetDevice ( 0x7ffd31f12b54 ) 
:3:hip_device_runtime.cpp   :642 : 24888422625d us:  hipGetDevice: Returned hipSuccess : 
:3:hip_platform.cpp         :225 : 24888422633d us:   __hipPushCallConfiguration ( {5120,1,1}, {128,1,1}, 0, stream:<null> ) 
:3:hip_platform.cpp         :229 : 24888422640d us:  __hipPushCallConfiguration: Returned hipSuccess : 
:3:hip_platform.cpp         :234 : 24888422646d us:   __hipPopCallConfiguration ( {0,0,0}, {0,0,1}, 0x7ffd31f12bb8, 0x7ffd31f12bb0 ) 
:3:hip_platform.cpp         :243 : 24888422653d us:  __hipPopCallConfiguration: Returned hipSuccess : 
:3:hip_module.cpp           :685 : 24888422660d us:   hipLaunchKernel ( 0x7b87f051c118, {5120,1,1}, {128,1,1}, 0x7ffd31f12bd0, 0, stream:<null> ) 
waheedi commented 1 month ago

ok apparently there is a difference in the module being called for, so in the hip_code_object there is different modules, so i thought it was successfully called, but now I can confirm that the error is the same regardless of the AMD_LOG_LEVEL value, so now it makes a bit of sense :) apologies for the confusion but I missed that log because of the many logs.

right now im doing a bit more debugging:

so here are we adding the function to the list of functions_

:1:hip_code_object.cpp      :1112: 2155995653d us:  Adding function: Cijk_Alik_Bjlk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_16_1_WGM8

then:

hipModuleLoadData: Returned hipSuccess : 
:3:hip_module.cpp           :74  : 2155996320d us:   hipModuleGetFunction ( 0x7ffd866b91d0, 0x8b25960, Cijk_Alik_Bjlk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_16_1_WGM8 ) 
:1:hip_code_object.cpp      :1004: 2155996327d us:  Number of functions available: 73

:1:hip_code_object.cpp      :1008: 2155996330d us:  Cannot find the function: Cijk_Alik_Bjlk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_16_1_WGM8 
:1:hip_module.cpp           :84  : 2155996337d us:  Cannot find the function: Cijk_Alik_Bjlk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_16_1_WGM8 for module: 0x8b25960
:3:hip_module.cpp           :85  : 2155996342d us:  hipModuleGetFunction: Returned hipErrorNotFound : 
:3:hip_error.cpp            :36  : 2155996346d us:   hipGetLastError (  ) 
:3:hip_module.cpp           :74  : 2155996350d us:   hipModuleGetFunction ( 0x7ffd866b91d0, 0xabeb4d0, Cijk_Alik_Bjlk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_8_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16
cgmb commented 1 month ago

I'm not sure of the details, but Tensile doesn't always know which module contains the function that it is looking for. The strategy in that case is to try loading from each module until it finds the right one. The attempts to load the function from the wrong module will fail and show up in the debug log.

waheedi commented 1 month ago

@cgmb thank you, and yes you are right, one of the modules have these functions for example in this case, the third module has 555 function while the first 73 and the second 534 according to this trace.

so I'm logging the number of functions before we fail or succeed in finding that function for the referenced module

func_name in this case is Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1

the log

:1:hip_code_object.cpp      :1004: 2156327703d us:  Number of functions available: 73
:1:hip_code_object.cpp      :1008: 2156327706d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 
:1:hip_module.cpp           :84  : 2156327711d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 for module: 0x8b25960
:3:hip_module.cpp           :85  : 2156327717d us:  hipModuleGetFunction: Returned hipErrorNotFound : 
:3:hip_error.cpp            :36  : 2156327719d us:   hipGetLastError (  ) 
:3:hip_module.cpp           :74  : 2156327725d us:   hipModuleGetFunction ( 0x7ffd866b62f0, 0xabeb4d0, Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 ) 
:1:hip_code_object.cpp      :1004: 2156327730d us:  Number of functions available: 534

:1:hip_code_object.cpp      :1008: 2156327734d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 
:1:hip_module.cpp           :84  : 2156327738d us:  Cannot find the function: Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 for module: 0xabeb4d0
:3:hip_module.cpp           :85  : 2156327745d us:  hipModuleGetFunction: Returned hipErrorNotFound : 
:3:hip_error.cpp            :36  : 2156327748d us:   hipGetLastError (  ) 
:3:hip_module.cpp           :74  : 2156327751d us:   hipModuleGetFunction ( 0x7ffd866b62f0, 0xed51c40, Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1 ) 
:1:hip_code_object.cpp      :1004: 2156327757d us:  Number of functions available: 555

:3:hip_module.cpp           :88  : 2156327762d us:  hipModuleGetFunction: Returned hipSuccess : 
:3:hip_module.cpp           :476 : 2156327769d us:   hipExtModuleLaunchKernel ( 0x0x11dada10, 262144, 1, 1, 128, 1, 1, 0, stream:<null>, char array:<null>, 0x7ffd866b6310, event:0, event:0, 0 ) 
:4:command.cpp              :352 : 2156327774d us:  Command (KernelExecution) enqueued: 0xac74570
:3:rocvirtual.cpp           :807 : 2156327778d us:  Arg0:  Tensor2dSizeA = val:67108864
:3:rocvirtual.cpp           :807 : 2156327780d us:  Arg1:  Tensor2dSizeB = val:32768
:3:rocvirtual.cpp           :807 : 2156327782d us:  Arg2:  AddressD = val:137826658156544
:3:rocvirtual.cpp           :807 : 2156327785d us:  Arg3:  AddressC = val:137826658156544
:3:rocvirtual.cpp           :807 : 2156327786d us:  Arg4:  AddressA = val:137826794471424
:3:rocvirtual.cpp           :807 : 2156327789d us:  Arg5:  AddressB = val:137828270538752
:3:rocvirtual.cpp           :807 : 2156327790d us:  Arg6:  Alpha = val:1065353216
:3:rocvirtual.cpp           :807 : 2156327793d us:  Arg7:  Beta = val:0
:3:rocvirtual.cpp           :807 : 2156327795d us:  Arg8:  StridesD = val:144115188076118016
:3:rocvirtual.cpp           :807 : 2156327799d us:  Arg9:  StridesC = val:144115188076118016
:3:rocvirtual.cpp           :807 : 2156327801d us:  Arg10:  StridesA = val:288230376151973888
:3:rocvirtual.cpp           :807 : 2156327804d us:  Arg11:  StridesB = val:256
:3:rocvirtual.cpp           :807 : 2156327806d us:  Arg12:  SizesFree = val:
:3:rocvirtual.cpp           :807 : 2156327809d us:  Arg13:  SizesSum = val:256
:3:rocvirtual.cpp           :807 : 2156327812d us:  Arg14:  OrigStaggerUIter = val:7
:3:rocvirtual.cpp           :807 : 2156327815d us:  Arg15:  NumWorkGroups0 = val:2048
:3:rocvirtual.cpp           :807 : 2156327817d us:  Arg16:  NumWorkGroups1 = val:1
:3:rocvirtual.cpp           :3033: 2156327821d us:  ShaderName : Cijk_Ailk_Bljk_SB_MT128x128x8_SN_AMAS3_BL1_BS1_EPS0_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA1010_IU1_K1_KLA_LDL1_LRVW4_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU32_SUM3_SUS128_SVW4_TT8_16_USFGROn1_VAW1_VSn1_VW4_VWB4_WS32_WG16_8_1_WGM1
waheedi commented 1 month ago

So I think the successful one is not logged in log_level_2 thus I thought that the function was never found, but it actually was found and nothing needs to be fixed if I understand correctly :D thank you again

waheedi commented 1 month ago

@NaveenElumalaiAMD @cgmb thank you for having a look I'm closing it as I think what is happening is not a failure due to a missing function but rather an ambiguous logging.