Open matinraayai opened 6 months ago
What calling convention do you expect these functions to conform to?
@t-tye for now the default emitted by LLVM, which is the C Calling convention. Support for other calling conventions and querying them can also be considered for the long term, but for now I don't think is required.
I do not think AMD GPU defines a fixed C Calling convention (see AMDGPUUsage). There are the complexities that the register allocation is done dynamically at kernel launch, so when a function is called its convention depends on the register budget allocated.
@t-tye I understand the reg constraint concerns. For now adding support for just recognizing them and locating them should be enough for my use case. Do you think that's feasible?
There are a lot of issues in a calling convention, and to me the AMD GPU calling convention is not likely at a level that you can do the full generality that you describe. That does not mean some lesser thing could not be achieved. We are wrestling with a lot of this in our work on the debugger and compiler, but my sense is that that work is not fully set yet.
To give a more precise answer we probably need to have a meeting to discuss further.
@t-tye let's meet to discuss this further.
As of right now, the HSA standard only supports identifying the following symbols types:
The standard has listed indirect functions as a symbol type, but the AMD ROCr runtime does not implement it.
Device functions on the other hand, are absent from this list, even though they can be identified by inspecting the Loaded Code Object's storage ELF directly, and are emitted by the LLVM AMDGPU compiler.
Supporting device functions as
hsa_executable_symbol_t
's can have the following benefits:hsa_executable_symbol_t
s means the loader can resolve their relocations. A user can have a library of device functions in a separate code object and another one that uses said library. Instead of having to link both code objects together before loading, the user can simply add both code objects into a single executable before freezing the executable.hsa_executable_symbol_t
seems like the logical option.cc @kzhuravl