Open wfjsw opened 1 month ago
@wfjsw My understanding is that your change will cause lazy loading for gfx1031, gfx1032, gfx1034, gfx1035 assembly kernels listed in .yaml files in the directory https://github.com/ROCm/rocBLAS/tree/develop/library/src/blas3/Tensile/Logic/asm_full . If you search for the strings gfx1031, gfx1032, gfx1034, gfx1035 in this directory you will not find any matches, so these strings are not in getLazyLoadingArch. When assembly kernels are added for an architecture, the architecture is added to getLazyLoadingArch.
Can you let us know the intention of your PR:
I currently have assembly kernels for these cards, but the stock rocblas.dll refuses to load them when they are placed in search path as it was in 5.7.1, due to this list being added since 6.0.
Also this does seem to affect non-lazyloading as well. Testing appears the non-lazy libraries are also not applied.
This patch possibly will fix the problem where the added map broke gfx1031-gfx1035, causing any Tensile solutions for these archs unable to load, forcing them to drop to fallback.
Related log:
Could you please backport this to HIP SDK 6.1.2 for Windows if possible?