ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/
Other
345 stars 165 forks source link

How to use roc-obj with rocblas-bench #1295

Closed nishshah0 closed 1 year ago

nishshah0 commented 1 year ago

I am trying to benchmark MI250X with rocblas-bench and I see some discrepancy. I want to look at the assembly using roc-obj. How do I enable dumping the assembly for the kernel being launched by rocblas-bench. Here is my command ./rocBLAS/build/release/clients/staging/rocblas-bench -m 8 -k 8 -n 8 -f gemm_ex -r bf16_r --compute_type f32_r -i 1 -j 1 --device 0

I tried adding HIP env variable export GPU_DUMP_CODE_OBJECT=1 before launching rocblas-bench, and I see bunch of object files, image When I open them with command roc-obj -d _code_object0000.o

I get following error

Error: No kernel section found
error: no executables specified
rkamd commented 1 year ago

@shaw586, To disassemble the code object, you could run the following command llvm-objdump -disassemble -arch=amdgcn -mcpu=gfx90a PATH_TO_CODE_OBJECT

rocBLAS gemm-ex kernels are provided by Tensile. You could use TENSILE_DB env var to get additional information about a kernel. For more information here : Tensile Environment Varaibles

nishshah0 commented 1 year ago

Thank. When you say "PATH_TO_CODE_OBJECT" how do I get the object file of the kernel being selected and launch?

rkamd commented 1 year ago

All the code object files are located in this directory ./rocBLAS/build/release/Tensile/library Setting TENSILE_DB should provide information about the code object loaded, kernel selected other additional information based on value of env variable.

Could you please provide some additional information of what type of discrepancy is observed and what info you are trying to get by inspecting the assembly instructions? Maybe those information could be printed using one of Tensile env variables

nishshah0 commented 1 year ago

Let me try that.

This is the issue where disassembly was requested, https://github.com/AMDResearch/omniperf/issues/66

nishshah0 commented 1 year ago

can you please look at the content of this file and help find which object file corresponds to kernel that is loaded at the end?

tensile_db_output.txt

rkamd commented 1 year ago

Thanks for providing the additional context. I agree with these comments

roc-obj cannot be used in this case to disassemble, as rocblas uses Tensile library to load code objects during runtime.

-- A Brief background, rocblas-bench loads all the code object file at startup( not included in timing loop), this is to avoid runtime penalty. Because of this reason, you are noticing all the code object files which are unrelated to your problem.

For your use case ( to get the assembly ), I would suggest the following:

  1. Avoid loading of all code object at startup, by commenting out this line: https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/clients/benchmarks/client.cpp#L939
  2. Re-compile the code by typing the following cd ./rocBLAS/build/release/clients;make -j32
  3. Re-run the rocblas-bench command with TENSILE_DB flag. Now you should only see code objects relevant to the input problem. Use the above llvm-objdump command to disassemble the code object file.

You should see something similar

loaded code object ../../Tensile/library/Kernels.so-000-gfx908-xnack-.hsaco
loaded code object ../../Tensile/library/TensileLibrary_Type_BB_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx908.co

Hope this helps!

nishshah0 commented 1 year ago

your pointer to the code to be modified points to this image Is this correct? This is a function argument.

rkamd commented 1 year ago

your pointer to the code to be modified points to this image Is this correct? This is a function argument.

sorry, this line https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/clients/benchmarks/client.cpp#L944

nishshah0 commented 1 year ago

I get following error with llvm-objdump objdump

rkamd commented 1 year ago

llvm-objdump -disassemble -arch=amdgcn -mcpu=gfx90a PATH_TO_CODE_OBJECT Add an extra dash - to all the parameter. llvm-objdump --disassemble --arch=amdgcn --mcpu=gfx90a PATH_TO_CODE_OBJECT

nishshah0 commented 1 year ago

Still same error. Would you mind trying on your end?

rkamd commented 1 year ago

llvm-objdump --disassemble --arch=amdgcn --mcpu=gfx90a

Command works on my end.

TorreZuk commented 1 year ago

Does just: llvm-objump -D TensileLibrary_Type_BB_HPA_Contraction_l_Ailk_Bljk_Cijk_Dijk_gfx908.co not produce anything? Otherwise which llvm-objdump?, the one in /opt/rocm/llvm/bin that built the tensile you are trying to disassemble?

nishshah0 commented 1 year ago

@TorreZuk thanks. I was using usr/local/bin/llvm-objdump. Using the one from /opt/rocm/llvm/bin worked with the command provided by @rkamd .

It may be worthwhile for this to be documented either in rocBLAS or roc-obj documentation on how to dump the kernel assembly.

TorreZuk commented 1 year ago

You need to watch out whenever potentially mixing toolchains, you should remove any external llvm clang from path if building rocblas or rocm as it will expect the ones from /opt/rocm to be used. Or inject them first in your path vars. We can add a general warning somewhere early in our doc.

rkamd commented 1 year ago

@shaw586 , If the issue is resolved, please close the ticket.