ROCm / clr

MIT License
85 stars 35 forks source link

[Issue]: build fails to non-existing llvm path #85

Open jdgh000 opened 2 weeks ago

jdgh000 commented 2 weeks ago

Problem Description

Tried building hipamd in clr but failing because LLVM_DIR is apparently wrongly set or not set: From clr/hipamd mkdir build cd build cmake ..

./..'

in that script file called hip_embed_pch.sh, the path is set in $LLVM_DIR/bin/clang which itself seems to be called from main CMakeLists.txt.

hip_embed_pch.sh USAGE: echo "Usage: $(basename "$0") HIP_BUILD_INC_DIR HIP_INC_DIR HIP_AMD_INC_DIR LLVM_DIR [option] [RTC_LIB_OUTPUT]"

This is called from cmake as: 183: execute_process(COMMAND sh -c "${CMAKE_CURRENT_SOURCE_DIR}/hip_embed_pch.sh \ ${HIP_COMMON_INCLUDE_DIR} \ ${PROJECT_BINARY_DIR}/include \ ${PROJECT_SOURCE_DIR}/include \ ${HIP_LLVM_ROOT}" \ COMMAND_ECHO STDERR RESULT_VARIABLE EMBED_PCH_RC WORKING_DIRECTORY ${CMAKE_BINARY_DIR}) set(HIP_LLVM_ROOT "${LLVM_DIR}/../../..")

HIP_LLVM_ROOT is defined here in same cmake: 179: if(NOT DEFINED HIP_LLVM_ROOT) 180: set(HIP_LLVM_ROOT "${LLVM_DIR}/../../..") but it is unclear what LLVM_DIR should be, by default path of rocm installation, it seems LLVM_DIR=/opt/rocm-6.1.0/llvm which I could pass on to cmake as env param but going back three folders up will land in / which is not going to work.

OS: NAME="CentOS Stream" VERSION="9" CPU: model name : Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz GPU: Name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz Marketing Name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz Name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz Marketing Name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz Name: gfx908 Marketing Name: AMD Instinct MI100 Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack- Name: gfx908 Marketing Name: AMD Instinct MI100 Name: amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-

Operating System

Centos 9

CPU

Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz

GPU

AMD Instinct MI100

ROCm Version

ROCm 6.1.0

ROCm Component

clr

Steps to Reproduce

from clr/hipamd/ ; mkdir build ; cd build

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

iassiour commented 2 weeks ago

Hi @jdgh000 I think there should be a /opt/rocm/llvm/lib/cmake/llvm and find_package normally returns that as LLVM_DIR. Then HIP_CLANG_ROOT is set to /opt/rocm/llvm. But it looks like that folder is missing from your installation and find_package returns the system one /usr/lib64/cmake/llvm instead. I think you probably need to additionally install the rocm-llvm-dev package. This will add the missing directories under /opt/rocm/llvm

jdgh000 commented 2 weeks ago

i am not understanding your evaluation at all and makes no sense at all!?. I already mentioned where it is having error in my original description. It is looking for clang nothing else which is installed as evidently seen in : /home/nonroot/build-install-scripts/rocm/ROCm-6.1/clr/hipamd/src/hip_embed_pch.sh: line 148: /usr/lib64/cmake/llvm/../../../bin/clang: No such file or directory

Regardless, llvm-devel installed with no expectation it will solve and after installation, you were wrong, and I was right, it made no difference.

iassiour commented 2 weeks ago

Hi @jdgh000, LLVM_DIR is a cache variable set by cmake find_package call here: https://github.com/ROCm/clr/blob/develop/hipamd/src/CMakeLists.txt#L182

In the find_package call, the PATHS is set to ${ROCM_PATH}/llvm. That means it is expected that the config file is found under /opt/rocm. In that case LLVM_DIR would be set to /opt/rocm/llvm/lib64/cmake/llvm and HIP_LLVM_ROOT, (the argument passed to the script hip_embed_pch.sh) points to /opt/rocm/llvm which would make sense.

A note that the above LLVM_DIR cmake variable is not to be confused with the 4th parameter LLVM_DIR that is used in hip_embed_pch.sh and that is actually the HIP_LLVM_ROOT.

In your case find_package(LLVM) instead of returning the llvm configuration under /opt/rocm it returns one found in /usr /usr/lib64/cmake/llvm. I think that is not expected and is likely the root cause of the problem as it leads to invalid paths later on.

After installing rocm-llvm-dev can you please confirm that /opt/rocm/llvm/lib/cmake/llvm or (/opt/rocm/llvm/lib64/cmake/llvm) exists on your machine. If it exists please try again with a clean cmake build (LLVM_DIR is a cache variable)

PS. I think the find_package call here https://github.com/ROCm/clr/blob/develop/hipamd/src/CMakeLists.txt#L182 should have set the NO_DEFAULT_PATH to ignore config files found in system paths like /usr and return only what is found under /opt/rocm.

jdgh000 commented 2 weeks ago

I actually step back to build instruction for clr using published instruction steps instead of trying to build from hipamd directory. But I see exactly same error occurs in cmake stage. I dont believe it is my job to go into deep debugging session, i think it should be on your plate to investigate and come back with 1) interim solution that works (proven to work) and 2) modify instruction as necessary, if users can not build following the instruction published, or if you don't, resulting in other users suffering the simlar fate. Stuff you are talking about LLVM_ROOT and other dirs are unfortunately not mentioned in the instruction: I dont want to go back and forth perpetually with unproven solution that may or may not work. All i can say to you is clang is in its most usual default location and works on other rocm builds: which clang /opt/rocm/llvm/bin/clang

Linux Clone this repository cd clr && mkdir build && cd build For HIP : cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR HIP_COMMON_DIR points to HIP HIPCC_BIN_DIR points to HIPCC's bin folder. If not provided, it defaults to /opt/rocm/bin. For OpenCL™ : cmake .. -DCLR_BUILD_OCL=ON make : to build make install : to install

cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=/home/nonroot/git/HIP ...hip_embed_pch.sh: line 148: /usr/lib64/cmake/llvm/../../../bin/clang: No such file or directory CMake Error at hipamd/src/CMakeLists.txt:192 (message): Failed to embed PCH

jdgh000 commented 1 week ago

do we have update here?

iassiour commented 1 week ago

Hi @jdgh000 can you modify your cmake command to this and confirm if it works: cmake .. -DCMAKE_PREFIX_PATH="/opt/rocm/" -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=/home/nonroot/git/HIP

In that case I will follow-up with a fix as setting the CMAKE_PREFIX_PATH should not be required, as per the published instructions.

jdgh000 commented 1 week ago

this appears working however this time, getting another error about max vs. fmax in header: To erase compaibility issues, i reinstalled rocm6.0 and checkout rocm-6.0.x branch of both HIP and clr and still occurs.

-- Found OpenGL: /usr/lib64/libOpenGL.so -- HIPCC_BIN_DIR found at /opt/rocm/bin -- HIP_COMMON_DIR found at /home/nonroot/git/HIP -- HIPNV_DIR found at -- Found Perl: /usr/bin/perl (found version "5.32.1") CMake Error at hipamd/CMakeLists.txt:87 (file): file STRINGS file "/home/nonroot/git/HIP/VERSION" cannot be read.

-- Found Git: /usr/bin/git (found version "2.43.0") -- Using CPACK_DEBIAN_PACKAGE_RELEASE local -- CPACK_RPM_PACKAGE_RELEASE: local%{?dist} -- HIP Platform: amd -- HIP Runtime: rocclr -- HIP Compiler: clang -- ROCM Installation path(ROCM_PATH): /opt/rocm -- HIP will be installed in: /opt/rocm -- Could NOT find Terminfo (missing: Terminfo_LIBRARIES Terminfo_LINKABLE) -- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11") -- Found zstd: /usr/lib64/libzstd.so -- Found LibXml2: /usr/lib64/libxml2.so (found version "2.9.13") 'sh' '-c' '/root/extdir/gg/git/clr/hipamd/src/hip_embed_pch.sh /home/nonroot/git/HIP/include /root/extdir/gg/git/clr/build/hipamd/include /root/extdir/gg/git/clr/hipamd/include /opt/rocm/llvm/lib/cmake/llvm/../../..'

iassiour commented 1 week ago

I think that the cause for the max vs. fmax error is this:

CMake Error at hipamd/CMakeLists.txt:87 (file):
file STRINGS file "/home/nonroot/git/HIP/VERSION" cannot be read.

Please double check that the path to the VERSION file exists and the permissions are correct.

jdgh000 commented 1 week ago

the file is there, it is unclear how it is resulting, i checkoud HIP as is, no modification done whatsoever but it can not process it: cat ../../HIP/VERSION

HIP_VERSION_MAJOR

6

HIP_VERSION_MINOR

0

HIP_VERSION_PATCH

32831

[root@localhost build]# cat ../hipamd/CMakeLists.txt | grep -n VERSION | grep ^87: 87:file(STRINGS ${HIP_COMMON_DIR}/VERSION VERSION_LIST REGEX "^[0-9]+")

jdgh000 commented 1 week ago

are you checking on your end? I am just wondering/surprised why kept thinking this is my environment and check this. This is clearly the issue with your end. For example, you mention "Please double check that the path to the VERSION file exists and the permissions are correct." but this how it is checkout from your repository snapshot and build fails.

iassiour commented 1 week ago

If the file is checkout as it is from the repo, with no modifications, it should work. I can't reproduce the issue on my end.

cmake --version cmake version 3.22.1

jdgh000 commented 5 days ago

I have hard time relating what you are saying. I already provided logs and version and you still arguing against facts that it should work. What do you mean by that? "I can't reproduce the issue on my end". This is soooo simple, you just checkout and build using instruction and easily reproducible. What steps did you take? What O/S and rocm version did you use?

iassiour commented 4 days ago

Hi @jdgh000,

Another surprise is I saw the version file error which I you led me to believe it is causing disrepancy of max vs. fmax. YOu did not provide any rationale response about why it occurs. For me, it makes no sense the failure to read VERSION causing max/fmax error.

cmake generates a file hip_version.h under the clr build directory that adds definitions based on the information read from the VERSION file. For example:

#define HIP_VERSION_MAJOR 6
#define HIP_VERSION_MINOR 0
#define HIP_VERSION_PATCH 32831

hip_version.h is then used by clang to setup the right include files for the hip version you are trying to build, among other places this file is being used.
If the version information is missing or is wrong, many compatibility issues can come up and the max vs fmax error we are seeing is one of them.

I am using Ubuntu 22.04.3, rocm 6.0, cmake version 3.22.1

git clone -b rocm-6.0.x https://github.com/ROCm/clr.git
git clone -b rocm-6.0.x https://github.com/ROCm/HIP.git
cd clr && mkdir build && cd build
cmake .. -DCMAKE_PREFIX_PATH="/opt/rocm/" -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR="/home/iassiour/HIP"
make

Following the steps above, I do not get the file VERSION cannot be read error and the configuration plus build runs successfully. A note that I am checking out the rocm-6.0.x branch of clr/HIP but makes no difference for me if I use the develop branches and the latest rocm version.

jdgh000 commented 3 days ago

i installed rocm6.0 on ubntu 22.04 but it says missing file make.in but it is there:

CMake Error: File /home/nonroot/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/clr/HIP/hip-lang-config.cmake.in does not exist.
CMake Error at /usr/share/cmake-3.22/Modules/CMakePackageConfigHelpers.cmake:342 (configure_file):
  configure_file Problem configuring file
Call Stack (most recent call first):
  hipamd/src/CMakeLists.txt:316 (configure_package_config_file)

/home/nonroot/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/HIP/hip-lang-config.cmake.in
root@localhost:~/extdir/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/clr/build# cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=~/extdir/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/HIP/
jdgh000 commented 3 days ago

Also seeing VERSION read fail randomly, the file is there but why it is not reading? what should be the permission? Again, i checked out as is in the repo and did not modify anything.


root@localhost:/home/nonroot/git//build-install-scripts/rocm/ROCm-6.0/clr/build# ls -l /home/nonroot/gitcodelab-scripts/build-install-scripts/rocm/ROCm-6.0/HIP//VERSION                                                                                                    -rw-r--r--. 1 root root 67 Jun 25 05:38 /root/extdir/git/codelab-scripts/build-install-scripts/rocm/ROCm-6.0/HIP//VERSION
root@localhost:/home/nonroot/git//build-install-scripts/rocm/ROCm-6.0/clr/build# cmake --version
cmake version 3.22.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).