[Feature Request]: Support for Treating Device Functions As hsa_executable_symbol_t's

matinraayai commented 6 months ago

As of right now, the HSA standard only supports identifying the following symbols types:

Kernels
Variables

The standard has listed indirect functions as a symbol type, but the AMD ROCr runtime does not implement it.

Device functions on the other hand, are absent from this list, even though they can be identified by inspecting the Loaded Code Object's storage ELF directly, and are emitted by the LLVM AMDGPU compiler.

Supporting device functions as hsa_executable_symbol_t's can have the following benefits:

The CUDA runtime treats device functions as symbols. Adding support in ROCr means HIP can also behave in the same manner as CUDA.
Supporting device functions as hsa_executable_symbol_ts means the loader can resolve their relocations. A user can have a library of device functions in a separate code object and another one that uses said library. Instead of having to link both code objects together before loading, the user can simply add both code objects into a single executable before freezing the executable.
In dynamic instrumentation, device functions are treated as symbols: a. A tool writer inserts callbacks in the kernel to device functions they have written in the tool; The instrumentation runtime should be able to identify where the device function is loaded, so that it can perform insert the requested callback into the target application. b. When analyzing the target kernel, a list of possible device functions called from it needs to be identified and returned to the tool writer, in case they want to instrument them as well. Exposing these as hsa_executable_symbol_t seems like the logical option.

cc @kzhuravl

t-tye commented 6 months ago

What calling convention do you expect these functions to conform to?

matinraayai commented 6 months ago

@t-tye for now the default emitted by LLVM, which is the C Calling convention. Support for other calling conventions and querying them can also be considered for the long term, but for now I don't think is required.

t-tye commented 6 months ago

I do not think AMD GPU defines a fixed C Calling convention (see AMDGPUUsage). There are the complexities that the register allocation is done dynamically at kernel launch, so when a function is called its convention depends on the register budget allocated.

matinraayai commented 6 months ago

@t-tye I understand the reg constraint concerns. For now adding support for just recognizing them and locating them should be enough for my use case. Do you think that's feasible?

t-tye commented 6 months ago

There are a lot of issues in a calling convention, and to me the AMD GPU calling convention is not likely at a level that you can do the full generality that you describe. That does not mean some lesser thing could not be achieved. We are wrestling with a lot of this in our work on the debugger and compiler, but my sense is that that work is not fully set yet.

To give a more precise answer we probably need to have a meeting to discuss further.

matinraayai commented 6 months ago

@t-tye let's meet to discuss this further.

ROCm / ROCR-Runtime

[Feature Request]: Support for Treating Device Functions As hsa_executable_symbol_t's #203