ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
415 stars 90 forks source link

Default HIP_PATH does not work with ROCm >= 6.0 #1624

Closed lahwaacz closed 1 month ago

lahwaacz commented 5 months ago

Bumping issue from https://github.com/ginkgo-project/ginkgo/issues/1529#issuecomment-2053598746:

hip_path.cmake still defaults to /opt/rocm/hip for HIP_PATH which does not work with ROCm 6.0:

https://github.com/ginkgo-project/ginkgo/blob/a8a407f35887c480af66500c20dc53a41d2f7604/cmake/hip_path.cmake#L3

Furthermore, hip.cmake has a code path where $ENV{HIP_PATH}/.. is used for `ROCM_PATH:

https://github.com/ginkgo-project/ginkgo/blob/a8a407f35887c480af66500c20dc53a41d2f7604/cmake/hip.cmake#L20

It would be nice if Ginkgo could auto-detect ROCm version and use the correct paths for 6.0 by default :wink:

lahwaacz commented 5 months ago

Actually, this problem spans to even the other HIP-related variables handled in hip.cmake. For example, HIPBLAS_PATH should default to "${ROCM_PATH}/hipblas" for ROCm 5, but since ROCm 6 it should be just "${ROCM_PATH}".

You can reproduce this very easily using the ROCm dev images with Docker or Podman (note that you don't need any GPU to just build Ginkgo):

podman run --rm -it rocm/dev-ubuntu-22.04:6.1.2-complete
# Run the following commands in the container:
apt-get update
apt-get install git ninja-build cmake
git clone --branch v1.8.0 https://github.com/ginkgo-project/ginkgo.git
export ROCM_PATH=/opt/rocm
export HIP_PATH="$ROCM_PATH"
cmake_flags=(
  -B build -S ginkgo -G Ninja
  -DGINKGO_BUILD_HIP=ON
  -DCMAKE_HIP_ARCHITECTURES="gfx906"
)
cmake "${cmake_flags[@]}"

The last command will give the following error:

-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_CXX: -fopenmp (found suitable version "4.5", minimum required is "3.0")
-- Found OpenMP: TRUE (found suitable version "4.5", minimum required is "3.0")
-- Enabling OpenMP executor
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) (Required is at least version "3.1")
-- Could NOT find MPI (missing: MPI_CXX_FOUND CXX) (Required is at least version "3.1")
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - NOTFOUND
-- Could NOT find PAPI (missing: PAPI_LIBRARY PAPI_INCLUDE_DIR sde) (Required is at least version "7.0.1.0")
-- The HIP compiler identification is Clang 17.0.0
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
CMake Error at cmake/hip.cmake:120 (find_package):
  By not providing "Findhipblas.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hipblas", but
  CMake did not find one.

  Could not find a package configuration file provided by "hipblas" with any
  of the following names:

    hipblasConfig.cmake
    hipblas-config.cmake

  Add the installation prefix of "hipblas" to CMAKE_PREFIX_PATH or set
  "hipblas_DIR" to a directory containing one of the above files.  If
  "hipblas" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  CMakeLists.txt:78 (include)

-- Configuring incomplete, errors occurred!
See also "/build/CMakeFiles/CMakeOutput.log".
See also "/build/CMakeFiles/CMakeError.log".

To make it work, the following exports are needed:

export ROCM_PATH=/opt/rocm
export HIP_PATH="$ROCM_PATH"
export HIPBLAS_PATH="$ROCM_PATH"
export HIPSPARSE_PATH="$ROCM_PATH"
export HIPFFT_PATH="$ROCM_PATH"
export ROCRAND_PATH="$ROCM_PATH"
export HIPRAND_PATH="$ROCM_PATH"

It would be nice to fix this in Ginkgo to make all the exports unnecessary :wink:

tpadioleau commented 3 months ago

Hi, I also recently ran into this issue.

I think the definition of HIP_PATH can and should be avoided because it can be used by the compiler (see https://releases.llvm.org/18.1.0/tools/clang/docs/HIPSupport.html#order-of-precedence-for-hip-path) leading to a failure during the compiler checks of cmake if it is not correctly set. Note that in this page they even recommend to not use HIP_PATH.

Regarding the libraries I could workaround all these paths by adding the rocm path to CMAKE_PREFIX_PATH (see https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html#finding-dependencies) then cmake was able to find them successfully. I tend to think that this should not be the responsibility of Ginkgo to set these cmake variables ?

MarcelKoch commented 3 months ago

I agree with @tpadioleau that we should not set these paths and instead rely on CMAKE_PREFIX_PATH being configured correctly. A quick test with the rocm image rocm/dev-ubuntu-22.04:6.1.2-complete shows that it works as expected, when the paths are not set by ginkgo. I think we will soon have a patch that addresses this issue.