Closed GZGavinZhao closed 7 months ago
Yes, you are my hope ❤️❤️❤️❤️❤️ @GZGavinZhao
Nevermind I was building a wrong configuration.TensileLibrary_gfx1010.co
is not produced while TensileLibrary_gfx1030.co
is. Is this expected? I tried running tests emulating my gfx1032
as gfx1010
and it passed, so I think this is fine?
Anyone you guys could upload gfx1010.dat before this pr merge into rocm,please? couldn't wait for running ollama with my 5700xt. Many Thanks. @GZGavinZhao
@wangxing7714436 I can't guarantee it will work, but I can give it a try. What ROCm version do you have?
@wangxing7714436 I can't guarantee it will work, but I can give it a try. What ROCm version do you have?
I installed 5.7 on windows, no tensile file on gfx1010. Many thanks for your reply. @GZGavinZhao
@wangxing7714436 Uh this is a little tricky. I'm not familiar with how rocBLAS works on Windows. If you can't find files like TensileLibrary_Type_4xi8I_HPA_Contraction_l_Alik_Bjlk_Cijk_Dijk_fallback_gfx1010.hsaco
, then I'm almost certain this won't work for you. If you can find these files, then in the same directory where you found these files, extract then put the attached .dat
file there and see if it will work. If this doesn't work, then I think you would unfortunately have to wait until the next ROCm Windows SDK release.
Fixes #1757. Reintroducing #1862.
Enables architectures that don't have optimized logic files to also produce libraries when
--separate-architectures
or--lazy-library-loading
is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures likegfx1010
.Previously, there was a bug in Tensile solution indexing that caused #1862 to be reverted. Now, it seems like this issue has been fixed in #1888.
Test plan:
With
AMDGPU_TARGETS
being one of the followingAMDGPU_TARGETS=gfx1010
AMDGPU_TARGETS=gfx1030;gfx1010
In all cases,
$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat
is produced and all other*.dat
files remain unchanged.In the second case,
./build/clients/staging/rocblas-test --gtest_filter='*gemm_ex_get_solutions*'
that previously failed now passes. I cannot run the full test suite due to limited memory on my GPU (I often get hipOutOfMemory when running stress tests). If this PR doesn't cause extra failures on AMD's CI or if someone can run the full test suite to ensure no additional failures are introduced, then I believe this PR should be good to go. Hopefully this PR can make it in before ROCm 6.1.