Open mmyxym opened 2 weeks ago
Thanks for your question. Almost all GEMM kernels, including F16, use assembly kernels stored in the repo for each architecture here: library/src/blas3/Tensile/Logic/asm_full. If you are interested to see the solution/kernel in the library logics for a particular GEMM or size, you need to find the size in the library and find the corresponding solutionIndex, and find it in the library. [m,n,batch,k] [solutionIndex, efficiency] Alternatively, (and f the size does not exit in the library), you can run your GEMM with this flag to print down the solutionIndex: TENSILE_DB=0X20000. The actual assembly kernels are in build_tmp in build.
@babakpst, thanks very much for your reply! I tried to use "TENSILE_DB=0X20000" to run torch.matmul and it gave Library logic solution index of winning solution: 39
. I guess we have no assembly kernels but only binary in docker image. And I build rocBlas from source code and see those assembly kernels in build/release/library/src/build_tmp/TENSILE/assembly/
. Could you please help me about how to associate the solution index: 39
with the exact assembly kernel? Thanks!
I'm looking for the permute policy that rocblas applied to avoid bank conflict btw, anybody have experience about that? :)
With FP16 MFMA on MI250
I'm looking for highly optimized gemm kernel source code, such as a typical f16 gemm implmentation with M,N,K=1024, but didn't find it through the repo of rocBLAS/hipBlasLt/rocWMMA/Tensile. Any information is appreciated, thanks!