ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
340 stars 157 forks source link

fix(tensile_host): fix solutions for gfx103x not able to load #1455

Open wfjsw opened 1 month ago

wfjsw commented 1 month ago

This patch possibly will fix the problem where the added map broke gfx1031-gfx1035, causing any Tensile solutions for these archs unable to load, forcing them to drop to fallback.

Related log:

ProblemMap Searching for Contraction_l_Alik_Bljk_Cijk_Dijk found Problem library (1 rows)
Object key: 768, 77, 768
Key: 768, 77, 768
Starting point: 17179869184, 1, 2937652110784
Rightward search...
Leftward search...

129, 129, 65: 905234 < 1.79769e+308 <-- Best distance, but no matching solution
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 65: 905234 == 905234
129, 129, 64: 906641 > 905234
129, 129, 64: 906641 > 905234

......

Considered 100% of entries.
Solution index selected: 69
Running kernel: Cijk_Alik_Bljk_HHS_BH_MT64x32x8_SN_AF0EM1_AMAS2_ASEM1_BL1_BS1_EPS0_FL0_GLVWA2_GLVWB1_GRVW2_GSU1_GSUASB_ISA000_IU1_K1_KLS_LPB0_LDL1_LRVW2_MMFSC_NLCA1_NLCB1_PGR0_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SVW4_TT4_2_USFGROn1_VAW1_VSn1_VW2_VWB2_WS64_WG16_16_1_WGM8

Could you please backport this to HIP SDK 6.1.2 for Windows if possible?

amcamd commented 1 month ago

@wfjsw My understanding is that your change will cause lazy loading for gfx1031, gfx1032, gfx1034, gfx1035 assembly kernels listed in .yaml files in the directory https://github.com/ROCm/rocBLAS/tree/develop/library/src/blas3/Tensile/Logic/asm_full . If you search for the strings gfx1031, gfx1032, gfx1034, gfx1035 in this directory you will not find any matches, so these strings are not in getLazyLoadingArch. When assembly kernels are added for an architecture, the architecture is added to getLazyLoadingArch.

Can you let us know the intention of your PR:

  1. Are you trying to lazy load assembly kernels for gfx1031, gfx1032, gfx1034, gfx1035?
  2. Are you trying to build rocBLAS for gfx1031, gfx1032, gfx1034, gfx1035?
wfjsw commented 1 month ago

I currently have assembly kernels for these cards, but the stock rocblas.dll refuses to load them when they are placed in search path as it was in 5.7.1, due to this list being added since 6.0.

Also this does seem to affect non-lazyloading as well. Testing appears the non-lazy libraries are also not applied.