ROCm / ROCm-CompilerSupport

The compiler support repository provides various Lightning Compiler related services.
47 stars 31 forks source link

Add AMDGPU Assembler Support Using LLVM MC #62

Closed matinraayai closed 5 months ago

matinraayai commented 7 months ago

Hello, I was wondering if AMD plans to provide an API in Comgr that assembles a list of assembly strings into machine code using LLVM MC? As of right now, it seems the only way to do that is to spawn a compiler job that takes in the assembly string, creates a .s data, and compiles it to a relocatable with clang.

Implementing this should not be hard, given that it shares a lot of its logic with the disassembler file.

kzhuravl commented 7 months ago

Hi @matinraayai, would AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE work for you?

Here is an example on how to use it https://github.com/RadeonOpenCompute/llvm-project/blob/0acdfc60bbeee6271667f1430c6a6aea054a51fd/amd/comgr/test/assemble_test.c#L87

cc @lamb-j

matinraayai commented 7 months ago

@kzhuravl yes we're already using AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE. However my issue is that, from my understanding of the source code, it spawns a compiler job and writes to tmp files, which seems overkill for assembling a handful of instructions. Also we don't need the relocatable, we just want the machine code. So we have to parse the relocatable and look for the text section to get it. I was wondering if LLVM MC can be a lighter alternative for this use case, and AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE be reserved for more complicated assembly files that warrants a compiler job.

lamb-j commented 7 months ago

From what I can tell, llvm-mc is intended as more of a developer tool. Per the llvm-mc.cpp comment, "This utility is a simple driver that allows command line hacking on machine code."

In general, the Comgr API aims to provide access to the more stable/"production level" facilities of LLVM, and is less suited for the more developmental/sandbox features.

To summarize some offline comments from @kzhuravl , llvm-mc does small fraction what clang does, doesn't handle things below dwarf 3, and has incomplete support for debug info.

One thing you could try would be to find the parts of llvm-mc that are useful for your project and re-implement (copy/paste) them inside your project. That could give you the functionality you're looking for without having to fork a separate llvm-mc process and write files to/from the filesystem.

matinraayai commented 7 months ago

@lamb-j @kzhuravl sorry what I meant by LLVM MC was the MC header folder in the LLVM project that contains all the MC classes saved in the target registry. It is already used to disassemble instructions in Comgr here: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/8276083301409001ec7643e68f5ad58b057c21fd/lib/comgr/src/comgr-disassembly.cpp#L44

As you both mentioned using the llvm-mc utility directly as a forked process is not suitable to be included in Comgr directly, and that is not what I'm requesting to be included in the project.

What I'm hoping to be included in the project is an assembly routine, which shares a lot of similarities with the disassembler routine in Comgr, but instead of creating a createMCDisassembler and other disassembly-related classes, it creates a MCCodeEmitter using this factory function for the AMD GPU target in TargetRegistery.h, plus other required classes (if any):

https://github.com/RadeonOpenCompute/llvm-project/blob/f618760b756d09cdfac7a0e1106f42f04f49f845/llvm/include/llvm/MC/TargetRegistry.h#L550

Then this assembly routine uses the MCCodeEmitter's encodeInstruction method to assemble an MCInstr to machine code. Other internal Comgr data structures for disassembly should be adapted accordingly and exposed to the end user. It should not require forking any processes.

This functionality doesn't need to emit any other information like the debug info. It should only be a "reverse disassembler". More advanced functionality should be reserved for AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE, or should be implemented by the users themselves.

Again, the MCCodeEmitter requires the same classes used for disassembly, including MCContext of an AMD GPU target. The logic for creating said classes (including parsing the target Twine) is already used for the disassembler in Comgr. From what I see it seems trivial to implement this functionality in Comgr.

As of whether this use case is production-level or not, My group and I are working on a tool for AMD that instruments AMD GPU code objects at runtime. This involves multiple calls to assemble trampolines, save/restore register routines, and other utilities at runtime, and included in an instrumented code object before being loaded onto the device. Although it is still a research project, we are hoping it would become a standard production-level tool used by AMD customers. We already use Comgr to disassemble instructions and expose them to a tool writer, and we want to cut down on the assemble cost incurred by using AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE.

Also @lamb-j you are on point that I can implement this myself on my end (and it is indeed what I'm doing for our tool for now), but there is a chance I cannot depend on the LLVM shipped in ROCm by AMD, since it does not have RTTI, and the code might require it, and LLVM-C might not cut it for my use case. Hence I might to require my users to compile LLVM from scratch, which is a tall order for an average user.

matinraayai commented 7 months ago

@lamb-j @kzhuravl any updates on this?

lamb-j commented 5 months ago

After some offline discussion with @matinraayai , we've decided the Assembler API is no longer a priority.

They have decided to go in a different direction with their tool that doesn't require it, so closing this issue for now.