Closed aaronmondal closed 1 year ago
At least the current approach could be more efficient by using C23 #embed
feature: https://thephd.dev/finally-embed-in-c23
At least the current approach could be more efficient by using C23 #embed feature: https://thephd.dev/finally-embed-in-c23
Very interesting read 😊 #embed
indeed looks like it could reduce complexity here. Unfortunately clang doesn't support it yet https://clang.llvm.org/c_status.html#c2x. I'd imagine we'd have to wait for 6-12 months until that becomes usable with a regular clang installation.
If it helps, I've rewritten the CMake build for the HIP portion of the ROCm stack in Bazel in rules_ll. I find it a bit easier to see what the build actually does in those files.
@aaronmondal Good suggestion! We're currently in the process of updating device-library linking to use -mlink-builtin-bitcode. (See AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC).
Once Comgr and all the software layers calling Comgr switch to the updated Action, we may be able to remove bc2h and simplify the build as you mention here.
I'll try to keep this issue updated with any progress!
I looked into this some more. One of the long-term goals of Comgr is to execute completely in-memory. By embedding the device-library bitcodes in header files, we are able to include them in the Comgr library. As a result, we are able to link against these device libraries during compilations without reading from the file system.
However, in the future we may be able to clean up the build using C23's #embed feature as suggested by @keryell, though it will be a while before the compilers we support can handle that. I'll make a note of that!
One of the long-term goals of Comgr is to execute completely in-memory.
Makes sense. I can see that there is no other way to handle this then until #embed
. Thanks a lot for looking into this!
The current build uses the
bc2h
tool to write the contents of the Device library bitcode files to headers and then includes those.Is linking the bitcode files with flags along the lines of
-mlink-bitcode-file
or similar not possible for some reason? Maybe because some files do the "same" thing (<feature>_on
,<feature>_off
)?If there was a way to not generate like 100MB of "textualized" device libs and instead use the raw
.bc
files it might be possible to simplify and potentially speed up the build.