Closed littlewu2508 closed 2 months ago
Hi, I encountered this error too and gathered some extra information:
So you can launch with AMD_COMGR_SAVE_TEMPS=1
, which will save temporary directory.
HSAKMT_DEBUG_LEVEL=7 AMD_LOG_LEVEL=1 AMD_COMGR_EMIT_VERBOSE_LOGS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_SAVE_TEMPS=1 /var/tmp/portage/dev-libs/rocm-comgr-6.1.0/work/llvm-project-rocm-6.1.0/amd/comgr_build/test/compile_hip_test_in_process
It outputs command, which after removing quotes looks like this:
clang --offload-arch=gfx906 -I /tmp/comgr-c9b0f7/include -x hip -std=c++11 -target x86_64-unknown-linux-gnu --cuda-device-only -isystem /usr/lib/clang/18 -c -emit-llvm --rocm-path=/tmp/comgr-c9b0f7/rocm -save-temps=/tmp/comgr-c9b0f7/output /tmp/comgr-c9b0f7/input/source2.hip -o /tmp/comgr-c9b0f7/output/source2.hip.bc
Which fails with:
/usr/lib/llvm/18/bin/../../../../lib/clang/18/include/__clang_cuda_complex_builtins.h:227:29: error: use of undeclared identifier 'max'; did you mean 'fmax'?
Now a small trick, replace clang
with HIPCC_VERBOSE=1 hipcc
and it works. Why? By decimating hipcc command, you can see that the important flag was --hip-version=6.1.1
.
Environment:
$ hipcc --version
HIP version: 6.1.40092-
clang version 18.1.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/18/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang++.cfg
$ clang --version
clang version 18.1.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/18/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
$ strace -e trace=open,openat,access,stat,statx,lstat -f clang -v 2>&1
| grep hipVersion
openat(AT_FDCWD, "/usr/lib/llvm/18/bin/.hipVersion", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/llvm/18/bin/.hipVersion", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/llvm/18/bin/../../../../lib/clang/18/bin/.hipVersion", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
So maybe possible reason (or part of) is that .hipVersion
is unknown to clang. Also there is known issue https://github.com/llvm/llvm-project/issues/78344 regarding path autodetection:
$ clang -v -print-targets -print-rocm-search-dirs 2>&1 | grep -e "HIP"
Found HIP installation: /usr/local, version 6.1.40092
Adding --hip-version=6.1.1
equivalent adding -include __clang_hip_runtime_wrapper.h
, which contains #include <__clang_hip_math.h>
(and many other autoincluded headers), which contains template max
.
compile_source_to_executable still failed with no clear reason
That's a separate issue. If you add logArgv(LogS, "???", Argv);
before https://github.com/ROCm/llvm-project/blob/rocm-6.1.1/amd/comgr/src/comgr-compiler.cpp#L757 you will see, that it fails on attempt to run job in executeInProcessDriver
with... arguments of clang-offload-bundler
... I don't know why! But it can be solved by disabling in-process compilation, which you can see in https://github.com/littlewu2508/gentoo/pull/3/files#diff-2c3851549d30124c76649584d8dddfa3ef07e522aafaf97ab4c49947509ea134
So no in-process compilation means that hipcc is called, which fixes both of issues. But issues are separate.
@littlewu2508 Has your issue been resolved? If so, please close the ticket. Thanks!
Problem Description
When running test suite for comgr, 4 tests failed:
Take
compile_hip_to_relocatable
as example:Similar issue is found in https://github.com/iree-org/iree/issues/16899
It seems that the missing
max
function inamd_math_functions.h
causes this. https://github.com/ROCm/clr/commit/d7d0f1131882ea1f42b7c42235b66d88cd9305a1 removes many lines of math functions inamd_math_functions.h
. According to the commit message those functions are in hiprtc headers, but I cannot find them.I add those functions back to
amd_math_functions.h
and 3 tests got passed.compile_source_to_executable
still failed with no clear reason, the compilation is successful:Operating System
Gentoo Prefix on upstream Linux kernel 6.6.13
CPU
AMD Ryzen 7 7700
GPU
AMD Radeon RX 7900 XT
ROCm Version
ROCm 6.1.0
ROCm Component
clr
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response