ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.67k stars 526 forks source link

How to use `/opt/rocm/llvm/bin/clang` with non default GCC compilers on HPC clusters? #2120

Open tcojean opened 4 years ago

tcojean commented 4 years ago

A subtitle for this issue would be: "The default HIP clang compiler cannot find GCC compilers on HPC clusters in non-default paths".

It's hard to find a place for this issue since it's related to HIP packaging, but also to clang interaction with GCC compilers. For now I want to start a discussion here and if you think there is a better place for this, then it can be moved.

The main problem I want to talk about is about the default /opt/rocm/llvm/bin/clang compiler installed with the ROCm 3.5 toolsuite used to compile HIP code by using the default system packages. This compiler being standard clang, it relies on a GCC installation for a number of things. One problem appears on HPC clusters, where the compilers are never in a standard path (they are usually in a shared filesystem). module or other tools are used to switch between different compiler versions and setup all the correct include, library, ..., paths. This issue is even worse in a Red Hat system, where the default system compiler is GCC 4.8.x which cannot even compile C++11 or higher code.

Details

In effect, on a typical Red Hat based cluster, these are the candidate GCC installation found:

[tcojean@methane ~]$ /opt/rocm/llvm/bin/clang -v
clang version 11.0.0 (/data/jenkins_workspace/centos_pipeline_job_3.5/rocm-rel-3.5/rocm-3.5-30-20200528/7.5/external/llvm-project/clang 6c08b900599eee52e12bce1e76b20dc413ce30e7)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

Despite having a GCC 9.2.0 module loaded (item 6):

[tcojean@methane ~]$ module li
Currently Loaded Modulefiles:
  1) git/2.20.1/gcc-7.2.0-7ww6                  3) htop/2.2.0/gcc-7.2.0-gqld                  5) the-silver-searcher/2.1.0/gcc-7.2.0-ot5x
  2) cmake/3.16.2/gcc-7.2.0-aoyx                4) hwloc/2.0.2/gcc-7.2.0-wllc                 6) gcc/9.2.0/gcc-7.2.0-3mdu

This is easily explained by the way that clang finds GCC:

  1. When compiling clang, one can specify a path to a GCC installation with the -DGCC_INSTALL_PREFIX CMake flag (see [1]). This of course only works when manually compiling clang directly for a target system. In addition, AFAIK one can only specify one GCC compiler in this case (HPC clusters often have many compiler version concurrently which they keep updating).
  2. As far as I can see, clang otherwise looks for a GCC installation in the default paths (/usr/lib/gcc/...). For redhat systems, it also explicitly looks for devtoolsets, see [2].

Main question

The main question is then the following: what is the standard/recommended way of installing and managing /opt/rocm/bin/llvm/clang on a HPC cluster?

  1. Should HPC cluster administrators compile a custom clang for every GCC compiler they install by setting the GCC_INSTALL_PREFIX correctly each time? Similarly, is the HIP stack packaged inside https://spack.io or another HPC packaging system which allows to do this automatically?
  2. Is there a way to make clang work with modules or other environment variables to find extra GCC compilers?
  3. Is it expected that symlinks to compiler versions are always set in /usr/xxx ?
  4. Should HPC system administrators limit themselves to Red Hat's devtoolset since clang currently supports this (currently, on the systems where I tried HIP/ROCm 3.5 this is what we have done, but we are missing a lot of compiler versions this way).
scchan commented 4 years ago

clang has a compiler switch called --gcc-toolchain that seems to support this kind of usage, do you mind giving that a try?

--gcc-toolchain=, -gcc-toolchain Use the gcc toolchain at the given directory

tcojean commented 4 years ago

Thanks for the reply!

It does work, but you need to pass in a lot of other flags (to find libraries, headers, related binaries such as crtbegin.o). For a simple example to work, this is what I have to do to compile properly:

/opt/rocm/llvm/bin/clang++ c++11_issue.hip.cpp -std=c++11 \
--gcc-toolchain=/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt/lib/gcc/x86_64-pc-linux-gnu/9.2.0/ \
-I/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt/include/c++/9.2.0/ \
-I/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt/include/c++/9.2.0/x86_64-pc-linux-gnu/ \
-L/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt/lib/gcc/x86_64-pc-linux-gnu/9.2.0/ \
-B/nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt/lib/gcc/x86_64-pc-linux-gnu/9.2.0/ \
-o compiled_with_clang++.exe

My test example is a very simple example compiled with C++11, since the default Red Hat compiler does not support C++11 that is sufficient to show success.

Overall, it's not very practical, but it's technically possible to add support for this as part of our application's CMake, if that's what we need to do.

For reference, here is the simple example (it's completely useless and also terrible, but it's enough):

#include <sstream>
#include <string>

static std::string foo(int bar) {
        std::ostringstream oss{};
        oss << "This is a dummy string with a " << bar;
        return oss.str();
}

int main(){
        auto str = foo(0);
        auto str2 = foo(20);
        return 0;
}

On success, there should be the following nm output. Notice the std::__cxx11 prefix for the specific standard library functions.

[tcojean@methane ~]$ nm compiled_with_clang++.exe | c++filt | grep char_traits
                 U std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::str() const@@GLIBCXX_3.4.21
                 U std::basic_ostream<char, std::char_traits<char> >::operator<<(int)@@GLIBCXX_3.4
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@@GLIBCXX_3.4.21
                 U std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()@@GLIBCXX_3.4.26
                 U std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream()@@GLIBCXX_3.4.21
                 U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@@GLIBCXX_3.4
wrwilliams commented 3 years ago

Just to add a bit of data here: ICC has some tooling support that addresses their version of this issue, with a script that will allow you to generate a config file for a given GCC toolchain and will pass all of the associated flags automatically. If there's some technical reason that the --gcc-toolchain option can't automatically handle everything (as opposed to a bug), a similar solution would be helpful.

In the worst case I can play games with module files (or ask our admins to) but that's not an easy-to-maintain fix.

nolta commented 3 years ago

I'm setting up our new AMD GPU cluster, and running into the same problem.

Very preliminary, but setting:

export HIPCC_COMPILE_FLAGS_APPEND="--gcc-toolchain=$(realpath -m $(which gcc)/../..)"

seems to have gotten things working, at least enough for me to run the ROCm validation suite.

@tcojean I think you might have passed the wrong directory to --gcc-toolchain; probably should have been /nfs/apps/spack/opt/spack/linux-centos7-x86_64/gcc-7.2.0/gcc-9.2.0-3mdu5gl5jymnhvqnmfaavtex2zguy2gt.

ppanchad-amd commented 5 months ago

@tcojean Is this still an issue for you with the latest ROCm 6.0.2 (HIP 6.0.32831)? If not, please close the ticket. Thanks!

ianx9781 commented 5 months ago

Running fedora server 40 [beta] with ROCm 6.0.2, it appears that with default repo installation the issue persists-

C compiler "/opt/rocm/llvm/bin/clang" not found: exec: "/opt/rocm/llvm/bin/clang": stat /opt/rocm/llvm/bin/clang: no such file or directory

yxsamliu commented 4 months ago

What is the command for reproducing the issue?

ianx9781 commented 4 months ago

Sorry about the delay, I will have to double-back on that hopefully soon and get all the details, getting sidetracked with work and some other projects for the time being.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows


From: Yaxun (Sam) Liu @.> Sent: Monday, April 1, 2024 9:12:06 AM To: ROCm/HIP @.> Cc: ianx9781 @.>; Comment @.> Subject: Re: [ROCm/HIP] How to use /opt/rocm/llvm/bin/clang with non default GCC compilers on HPC clusters? (#2120)

What is the command for reproducing the issue?

— Reply to this email directly, view it on GitHubhttps://github.com/ROCm/HIP/issues/2120#issuecomment-2029737669, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BHO3K4LAEFNM27NVBPPJ4R3Y3FMKNAVCNFSM4OLFJGBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSHE3TGNZWGY4Q. You are receiving this because you commented.Message ID: @.***>