Closed nishshah0 closed 1 year ago
@shaw586,
To disassemble the code object, you could run the following command
llvm-objdump -disassemble -arch=amdgcn -mcpu=gfx90a PATH_TO_CODE_OBJECT
rocBLAS gemm-ex kernels are provided by Tensile. You could use TENSILE_DB env var to get additional information about a kernel. For more information here : Tensile Environment Varaibles
Thank. When you say "PATH_TO_CODE_OBJECT" how do I get the object file of the kernel being selected and launch?
All the code object files are located in this directory ./rocBLAS/build/release/Tensile/library
Setting TENSILE_DB should provide information about the code object loaded, kernel selected other additional information based on value of env variable.
Could you please provide some additional information of what type of discrepancy is observed and what info you are trying to get by inspecting the assembly instructions? Maybe those information could be printed using one of Tensile env variables
Let me try that.
This is the issue where disassembly was requested, https://github.com/AMDResearch/omniperf/issues/66
can you please look at the content of this file and help find which object file corresponds to kernel that is loaded at the end?
Thanks for providing the additional context. I agree with these comments
roc-obj cannot be used in this case to disassemble, as rocblas uses Tensile library to load code objects during runtime.
-- A Brief background, rocblas-bench loads all the code object file at startup( not included in timing loop), this is to avoid runtime penalty. Because of this reason, you are noticing all the code object files which are unrelated to your problem.
For your use case ( to get the assembly ), I would suggest the following:
cd ./rocBLAS/build/release/clients;make -j32
You should see something similar
loaded code object ../../Tensile/library/Kernels.so-000-gfx908-xnack-.hsaco
loaded code object ../../Tensile/library/TensileLibrary_Type_BB_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx908.co
Hope this helps!
your pointer to the code to be modified points to this Is this correct? This is a function argument.
your pointer to the code to be modified points to this Is this correct? This is a function argument.
sorry, this line https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/clients/benchmarks/client.cpp#L944
I get following error with llvm-objdump
llvm-objdump -disassemble -arch=amdgcn -mcpu=gfx90a PATH_TO_CODE_OBJECT
Add an extra dash-
to all the parameter. llvm-objdump --disassemble --arch=amdgcn --mcpu=gfx90a PATH_TO_CODE_OBJECT
Still same error. Would you mind trying on your end?
llvm-objdump --disassemble --arch=amdgcn --mcpu=gfx90a
Command works on my end.
Does just: llvm-objump -D TensileLibrary_Type_BB_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx908.co not produce anything? Otherwise which llvm-objdump?, the one in /opt/rocm/llvm/bin that built the tensile you are trying to disassemble?
@TorreZuk thanks. I was using usr/local/bin/llvm-objdump. Using the one from /opt/rocm/llvm/bin worked with the command provided by @rkamd .
It may be worthwhile for this to be documented either in rocBLAS or roc-obj documentation on how to dump the kernel assembly.
You need to watch out whenever potentially mixing toolchains, you should remove any external llvm clang from path if building rocblas or rocm as it will expect the ones from /opt/rocm to be used. Or inject them first in your path vars. We can add a general warning somewhere early in our doc.
@shaw586 , If the issue is resolved, please close the ticket.
I am trying to benchmark MI250X with rocblas-bench and I see some discrepancy. I want to look at the assembly using roc-obj. How do I enable dumping the assembly for the kernel being launched by rocblas-bench. Here is my command
./rocBLAS/build/release/clients/staging/rocblas-bench -m 8 -k 8 -n 8 -f gemm_ex -r bf16_r --compute_type f32_r -i 1 -j 1 --device 0
I tried adding HIP env variable export GPU_DUMP_CODE_OBJECT=1 before launching rocblas-bench, and I see bunch of object files, When I open them with command
roc-obj -d _code_object0000.o
I get following error