ROCm / ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
https://rocm.docs.amd.com/projects/ROCR-Runtime/en/latest/
Other
205 stars 97 forks source link

Feature Request: Implement Indirect Function hsa_executable_symbol_t and Its Info Query #176

Closed matinraayai closed 2 months ago

matinraayai commented 6 months ago

In the ROCr Documentation, features related to creating an indirect function symbol and querying its information is not implemented. Are there any plans to implement them? If not what are the issues that blocks this?

t-tye commented 5 months ago

In the HSA HSAIL spec there was the addition of indirect functions. These were never implemented so this support was never used. One of the things likely needed for this is a standard ABI for heterogeneous function pointers. For numerous reasons AMD GPU does not have a standard function ABI yet.

Interested in what you are looking to do with this?

TimourPaltashev commented 5 months ago

H Matin, we need more detailed justification of potential usage for requested features.

matinraayai commented 5 months ago

@t-tye @TimourPaltashev Thank you for the response. We want to use this in our binary instrumentation framework (called Luthier) for the following use cases:

  1. Internally, we use indirect functions heavily (e.g. indirect functions are the primary payload for instrumentation we need to keep track of, what possible indirect functions can be called from inside the kernel, etc). Since there's no support for indirect functions in HSA, we have a wrapper around HSA primitives that works around it, which does the following: a. It iterates over all symbols, and locates all the STT:FUNC symbols b. If any of those symbols have a KD symbol associated with them, then they're an indirect function symbol.

To locate a single (or all) symbols in an executable, we have to pay additional lookup penalties, to ensure we cover the indirect functions. Having HSA implementing this feature will remove this penalty.

  1. Externally, we want to let the tool users know the list of indirect functions that can potentially be called from the kernel that is about to get launched; But since ROCr doesn't have this implemented, we have to work around this issue by "emulating the hsa_executable_symbol_t ourselves", which is roughly the following:
  2. Have a record of indirect functions located in each hsa_executable_t internally, which will be populated as executables get queried during the life of the program.
  3. If we're returning an indirect function to the user, then the address of the symbol (possibly on the host or device, we haven't decided yet) will be their 64-bit handle value.
  4. If the user passes us an hsa_executable_t, we will have to see if that matches any of our internal records for indirect functions first, before passing it to HSA for usage (to avoid any invalid argument errors).

This workaround is doable on our end, but puts additional burden on Luthier, for something that seems more fit to be implemented in HSA.

I understand that, from the host side, there's little need for the runtime to know about the indirect function. But our use case relies heavily on them being identified as quickly as possible, and exposed to the end user.

t-tye commented 5 months ago

Currently symbols are put into the dynsym table for kernel descriptors and the entry point of the kernel. The latter is not required in the symbol table as the language runtime only requires the kernel descriptors. A future compiler may stop putting the kernel entry point in the symbol table.

The HSA Runtime only returns the kernel descriptor symbols. There are no indirect functions being generated by the compiler.

The ABI for the kernel descriptor is specified at https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor . If you require the entry point of the kernel you can use the signed KERNEL_CODE_ENTRY_BYTE_OFFSET relative to the base address of the kernel descriptor.

Be aware that some targets support preloaded kernel arguments which results in there being 2 entry points to the kernel. See https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments .

matinraayai commented 2 months ago

I'm closing this in favor of #203.