ROCm / llvm-project

This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific topics (amd/*). For all other issues/PRs, please submit upstream at https://github.com/llvm/llvm-project.
Other
113 stars 55 forks source link

Binary hipcc doesn't detect clang properly #78

Closed tpkessler closed 2 months ago

tpkessler commented 1 year ago

Hi! After building the 5.6.0 branch for Arch Linux I noted that hipcc.bin doesn't work properly as the compiler path isn't picked correctly. The output of

env LANG=C.UTF-8 ./build/hipcc.bin --version

is

sh: line 1: /tmp/canRunqEY2uN: Is a directory
sh: line 1: /tmp/canRunGoAqAz: Is a directory
sh: line 1: /tmp/canRunJ6tdsf: Is a directory
sh: line 1: /tmp/canRun6EPxCE: Is a directory
Device not supported - Defaulting to AMD
sh: line 1: /bin/rocm_agent_enumerator: No such file or directory
sh: line 1: /tmp/canRunxo0z3d: Is a directory
sh: line 1: /tmp/canRunG1KbTL: Is a directory
Hip Clang Compiler not found
HIP version: 4.4.0-0
sh: line 1: llvm/bin/clang++: No such file or directory

failed to execute:llvm/bin/clang++ --driver-mode=g++ -L"/home/torsten/Dokumente/HIPCC/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt  --version -Wl,-rpath=/home/torsten/Dokumente/HIPCC/lib:/lib -lamdhip64  -Lllvm/bin/../lib/clang//lib/linux -lclang_rt.builtins-x86_64

The issue is that complierPath [sic!] as constructed in src/hipBin_amd.h::HipBinAmd::constructCompilerPath relies on getRoccmPath() in src/hipBin_base.h which only returns the content of the environment var ROCM_PATH. If this is not set, hipClangPath is an empty path. The perl script instead defaults to /opt/rocm/ as ROCM_PATH when no env var is set.

But setting ROCM_PATH doesn't really fix my issue.

env LANG=C.UTF-8 ROCM_PATH=/opt/rocm ./build/hipcc.bin --version

with output

sh: line 1: /tmp/canRunIRLYT9: Is a directory
sh: line 1: /tmp/canRunDRvTyG: Is a directory
sh: line 1: /tmp/canRunI3Saom: Is a directory
sh: line 1: /tmp/canRun3Nr0Ej: Is a directory
Device not supported - Defaulting to AMD
sh: line 1: /tmp/canRunfa4okA: Is a directory
sh: line 1: /tmp/canRunlaL7PI: Is a directory
Hip Clang Compiler not found
HIP version: 4.4.0-0
clang version 16.0.0
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin

It does report the correct path but there's still Hip Clang Compiler not found raised by src/hipBin_amd.h::HipBinAmd::getCompilerVersion. I haven't figured out yet why this doesn't work.

Furthermore I'm wondering where the shell errors sh: line 1 ... are coming from.

The perl script works as expected:

env LANG=C.UTF-8 ROCM_PATH=/opt/rocm ./build/hipcc.pl --version

HIP version: 5.5.0-0
clang version 16.0.0
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin
PIPIPIG233666 commented 1 year ago

i guess i have a similar issue on Gentoo while compiling caffe2

Can't exec "/usr/../llvm/bin/clang": No such file or directory at /usr/bin//hipcc.pl line 178.
Use of uninitialized value $HIP_CLANG_RT_LIB in scalar chomp at /usr/bin//hipcc.pl line 179.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
sh: line 1: /usr/../llvm/bin/clang: No such file or directory

HIP_CLANG_PATH is set to /usr/lib/llvm/16/bin whereas hipcc still picks up the default ROCM_PATH/../llvm/bin/clang

tpkessler commented 1 year ago

Hi @PIPIPIG233666! Your issue is the same as I reported for https://github.com/ROCmSoftwarePlatform/rocALUTION/issues/174 It's a bug in the cmake config file for HIP.

PIPIPIG233666 commented 1 year ago

(https://github.com/ROCmSoftwarePlatform/rocALUTION/pull/17) It's a bug in the cmake config file for HIP.

Thanks! It should be #174 instead of ROCm/HIPCC#17 but thank you for the heads up.

PIPIPIG233666 commented 1 year ago

Hi @PIPIPIG233666! Your issue is the same as I reported for ROCmSoftwarePlatform/rocALUTION#174 It's a bug in the cmake config file for HIP.

actually i fixed mine by setting HIP_CLANG_PATH as env var, somehow the perl module var still didn't get picked up tho

tpkessler commented 7 months ago

ROCm 6.0.2 still suffers from this issue. @Mystro256 can you shed some light on this?

ppanchad-amd commented 2 months ago

@tpkessler Internal ticket is created is created to investigate this issue. Thanks!

jamesxu2 commented 2 months ago

Hi @PIPIPIG233666 @tpkessler, these issues were resolved in the new release of ROCm 6.2 - see this commit: https://github.com/ROCm/llvm-project/commit/ecb18b9a75ba1908750e510cb291ce9955311bd5

Please try checking out and building against the rocm/llvm-project (rocm-6.2.x branch). While I was able to reproduce your issue from a source build in earlier versions, I see the correct output in the most recent rocm.

/llvm-project/amd/hipcc/build$ ./hipcc.bin --version
HIP version: 6.2.0-0
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.2.0 24292 26466ce804ac523b398608f17388eb6d605a3f09)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/lib/llvm/bin
Configuration file: /opt/rocm-6.2.0/lib/llvm/bin/clang++.cfg

If this issue recurs, please reopen the ticket.

Aside: Those shell errors are due to use of mkdtemp instead of mk(s)temp here, resulting in an attempted "write" to a temporary directory (which was meant to be a temporary file; you cannot write to a directory).