ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.69k stars 527 forks source link

[Issue]: rocm/6.2.0 installation from source on NVIDIA (Perlmutter machine) #3570

Open rgayatri23 opened 1 month ago

rgayatri23 commented 1 month ago

Problem Description

I was following the commands to install hip using the instructions provided here I get the following issue

cmake -DHIP_COMMON_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip -DHIP_PLATFORM=nvidia -DCMAKE_INSTALL_PREFIX=/global/cfs/cdirs/nstaff/rgayatri/software/hip/clr/build/build/install -DHIP_CATCH_TEST=0 -DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=OFF -DHIPNV_DIR=/global/cfs/cdirs/nstaff/rgayatri/software/hip/hipother/hipnv ..
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Cray Programming Environment 2.7.30 CXX
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/craype/2.7.30/bin/CC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- HIPCC Binary Directory: /opt/rocm/bin
CMake Error at CMakeLists.txt:51 (message):
  Please pass hipcc/build or hipcc/bin using -DHIPCC_BIN_DIR.

Am I missing a step as I am unsure of why the build is looking for /opt/rocm

Operating System

SLES

CPU

AMD EPYC 7713 64-Core

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

rgayatri23 commented 1 month ago

Edit - The GPU version listed here is not correct. Perlmutter machine has NVIDIA A100 GPU. I had to put a value there in order to submit the issue and the NVIDIA GPUs were not available in the list of options.

cjatin commented 1 month ago

You need hipcc as well. can you search any package named hipcc and install it. Point to it via -DHIPCC_BIN_DIR=<dir>

rgayatri23 commented 1 month ago

I am trying to install hipcc via this package. Is that a different package?

cjatin commented 1 month ago

Its a different package. you can clone it: https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc and point it to -DHIPCC_BIN_DIR=llvm-project/amd/hipcc/bin dir

rgayatri23 commented 1 month ago

Thanks @cjatin . I am a bit confused at this point. In order to build hipcc, I need to point it to HIPCC_BIND_DIR ?

Additionally, while I was able to build the hipcc compiler using your solution, when I tried a simple test program, I got the following error

hipcc.bin not present; install HIPCC binaries before proceeding
cjatin commented 1 month ago

so HIPCC can be a perl-script or a cpp application. I think by default it tries to use the cpp application, you can bypass it by setting env variable : HIP_USE_PERL_SCRIPTS=1

In case you want to build hipcc.bin: go to llvm-project/amd/hipcc mkdir build && cd build cmake .. -DCMAKE_INSTALL_PREFIX=<where to install hipcc> make -j install

This will install hipcc.bin to the desired location

I would recommend you to build hipcc and then point to the hipcc install directory via -DHIPCC_BIN_DIR while building clr

rgayatri23 commented 1 month ago

I am trying to build a simple test from hip-tests and it looks like hipcc is unable to find cuda. Is there a way to pass the location of nvcc ? It is installed in a non standard location. I tried passing CUDA_TOOLKIT_ROOT_DIR to install and app-compilation but they got ignored.

rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> cmake -DCMAKE_CXX_COMPILER=hipcc -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME ../
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is unknown
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc - broken
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_46803/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_46803.dir/build.make CMakeFiles/cmTC_46803.dir/build
    gmake[1]: Entering directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc    -o CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    sh: /usr/local/cuda/bin/nvcc: No such file or directory
    failed to execute:/usr/local/cuda/bin/nvcc  -Wno-deprecated-gpu-targets  -isystem /usr/local/cuda/include -isystem "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/include" -x cu  -o "CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o" -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    gmake[1]: *** [CMakeFiles/cmTC_46803.dir/build.make:78: CMakeFiles/cmTC_46803.dir/testCXXCompiler.cxx.o] Error 127
    gmake[1]: Leaving directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_46803/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:23 (project)

-- Configuring incomplete, errors occurred!
See also "/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeOutput.log".
See also "/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeError.log".
rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> which nvcc
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc
scchan commented 1 month ago

Could you try setting the env var CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 and see if that works around the non-standard location?

rgayatri23 commented 1 month ago

Could you try setting the env var CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 and see if that works around the non-standard location?

No that did not help.

rgayatri23 commented 1 month ago

Update: I was able to compile the app with regular compilation using hipcc square.cu -o square.ex but the CMake build fails with a different error The error is nvcc fatal : Unknown option '-rdynamic'

rgayatri@perlmutter:login40:/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build> cmake -DCMAKE_CXX_COMPILER=hipcc ../
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.30 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.30/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc
-- Check for working CXX compiler: /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc - broken
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_02053/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_02053.dir/build.make CMakeFiles/cmTC_02053.dir/build
    gmake[1]: Entering directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc    -o CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    HIP_PATH=/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0
    HIP_PLATFORM=nvidia
    HIP_COMPILER=nvcc
    HIP_RUNTIME=cuda
    CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2
    hipcc-args: -o CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    hipcc-cmd: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets  -isystem /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/include -isystem "/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/include" -x cu  -o "CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o" -c /global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Linking CXX executable cmTC_02053
    /global/u1/r/rgayatri/.local/cmake/bin/cmake -E cmake_link_script CMakeFiles/cmTC_02053.dir/link.txt --verbose=1
    /global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0/bin/hipcc -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o cmTC_02053
    HIP_PATH=/global/common/software/nstaff/rgayatri/gpu/rocm/6.2.0
    HIP_PLATFORM=nvidia
    HIP_COMPILER=nvcc
    HIP_RUNTIME=cuda
    CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2
    hipcc-args: -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o cmTC_02053
    nvcc fatal   : Unknown option '-rdynamic'
    hipcc-cmd: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets -lcuda -lcudart -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64  -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o "cmTC_02053"
    failed to execute:/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/bin/nvcc  -Wno-deprecated-gpu-targets -lcuda -lcudart -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64  -rdynamic CMakeFiles/cmTC_02053.dir/testCXXCompiler.cxx.o -o "cmTC_02053"
    gmake[1]: *** [CMakeFiles/cmTC_02053.dir/build.make:99: cmTC_02053] Error 1
    gmake[1]: Leaving directory '/global/cfs/cdirs/nstaff/rgayatri/software/hip/hip-tests/samples/0_Intro/square/build/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_02053/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:23 (project)