ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
383 stars 86 forks source link

ROCm / HIP PATHs related ginkgo installation problems on Ubuntu 22.04 #1614

Open klausbu opened 1 month ago

klausbu commented 1 month ago

I am trying to install ginkgo on Ubuntu 22.04. I have an up-to-date default installation of AMD ROCm 6.1.1 which works fine. The ginkgo installation process using cmake as described on the webpage, doesn't find AMD ROCm nor HIP or any of the required cmake files so I provided the paths:

export HIP_PATH=/opt/rocm

export hipblas_DIR=/opt/rocm/lib/cmake/hipblas

export CMAKE_PREFIX_PATH=/opt/rocm/lib/cmake/hip

export AMDDeviceLibs_DIR=/opt/rocm-6.1.1/lib/cmake/AMDDeviceLibs/

export amd_comgr_DIR=/opt/rocm-6.1.1/lib/cmake/amd_comgr/

The following one triggers an error: »hsa-runtime64_DIR=/opt/rocm-6.1.1/lib/cmake/hsa-runtime64/«: Ist kein gültiger Bezeichner export hsa-runtime64_DIR=/opt/rocm-6.1.1/lib/cmake/hsa-runtime64/

I used the following cmake command: cmake -G "Unix Makefiles" -DGINKGO_BUILD_HIP=ON -DCMAKE_HIP_ARCHITECTURES="gfx1031" .. && cmake --build .

I assume the install package is not up-to-date regarding ROCm / HIP install paths?!

upsj commented 1 month ago

First, unfortunately your build will likely fail, since we don't support gfx10xx (yet), see #1429. Second, these environment variables should no longer be necessary since #1334, as long as amdclang++ or hipcc can be found. Which commit are you looking at? Also what CMake version are you using?

klausbu commented 1 month ago

The cmake version is cmake version 3.22.1

How can I specify the hipcc location during the installation process?

upsj commented 1 month ago

I think the easiest solution should be pointing HIPCXX at amdclang++, if it's not already in the PATH. Though it might also help just to try out a newer version of CMake, since HIP 6.1.1 came out quite some time after CMake 3.22.1, and there might be some changes to the CMake setup that are not reflected properly.

klausbu commented 1 month ago

I am going in a circle, now HIPCXX is pointing to amdclang++ but the cmake flies are still not detected:


-- The HIP compiler identification is Clang 17.0.0
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /opt/rocm-6.1.1/llvm/bin/clang++ - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
CMake Error at cmake/hip.cmake:120 (find_package):
  By not providing "Findhipblas.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hipblas", but
  CMake did not find one.

  Could not find a package configuration file provided by "hipblas" with any
  of the following names:

    hipblasConfig.cmake
    hipblas-config.cmake

  Add the installation prefix of "hipblas" to CMAKE_PREFIX_PATH or set
  "hipblas_DIR" to a directory containing one of the above files.  If
  "hipblas" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  CMakeLists.txt:78 (include)

-- Configuring incomplete, errors occurred!
See also "/home/klaus/Programme/ginkgo/build/CMakeFiles/CMakeOutput.log".
See also "/home/klaus/Programme/ginkgo/build/CMakeFiles/CMakeError.log".
upsj commented 1 month ago

Can you try setting -DCMAKE_PREFIX_PATH=/opt/rocm-6.1.1 as outlined in https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html? By choosing to install ROCm in a non-standard location like /usr in their packages, AMD made it slightly harder for things to be found by default. Module systems on HPC clusters usually take care of that for you.

klausbu commented 1 month ago

The following triggered the compilation on Ubuntu 22.04 with ROCm 6.1.1:

cmake -G "Unix Makefiles" -D GINKGO_BUILD_HIP=ON -D CMAKE_HIP_ARCHITECTURES="gfx1031" -D CMAKE_PREFIX_PATH=/opt/rocm-6.1.1 .. && cmake --build .

Now I need to look into the architecture specific warp size related compilation error that's discussed in the other thread.

upsj commented 1 month ago

Dealing with warp size 32 requires some refactoring on our side (since we assume the warp size is known on the host at compile time, this assumption is violated in a mixed gfx10xx/gfx9xx build), so it is unlikely that you will be able to fix it easily. In the short run, we can only support server-grade GPUs with warp size 64.

klausbu commented 1 month ago

I don't run a mixed build, only -D CMAKE_HIP_ARCHITECTURES="gfx1031", the purpose is to test https://github.com/hpsim/OGL

upsj commented 1 month ago

IIRC the ROCm clang compiler always claims the warp size is 64 from the host side regardless of the device architecture, so that will not make a difference. You could try patching the warpSize to be 32 inside config.hip.hpp and see if it works?

upsj commented 1 month ago

I've been planning on setting up a CI system with a consumer GPU for a while now, I guess this is a good time to get started ;)

klausbu commented 1 month ago

That's a very good idea, I have been looking into GPU compute for CFD for years but have not been able to confirm even one of the many speedup claims so I am not going to invest in server grade hardware before I have managed to setup an effective implementation of some kind.