ROCm / hipFFT

hipFFT is a FFT marshalling library.
https://rocm.docs.amd.com/projects/hipFFT/en/latest/
Other
52 stars 29 forks source link

[Issue]: Building hipFFT on NVIDIA platform. [Perlmutter supercomputer] #89

Closed rgayatri23 closed 2 days ago

rgayatri23 commented 5 months ago

Problem Description

I am trying to build hipfft/rocm-5.5.1 on NVIDIA A100 GPUs available on the Perlmutter supercomputer. I already have cuda/12.2 and the corresponding cuFFT in my path. There is also hipcc/5.5.1 that is configured with the said cuda version. Here is the CMake Command:

cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_LIB=CUDA -DCMAKE_INSTALL_PREFIX=$PWD/../install -L ../

The error

-- Found ROCm
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
  By not providing "Findamd_comgr.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "amd_comgr", but CMake did not find one.

  Could not find a package configuration file provided by "amd_comgr" with
  any of the following names:

    amd_comgrConfig.cmake
    amd_comgr-config.cmake

  Add the installation prefix of "amd_comgr" to CMAKE_PREFIX_PATH or set
  "amd_comgr_DIR" to a directory containing one of the above files.  If
  "amd_comgr" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  /global/common/software/nersc/pe/rocm/5.5.1/lib64/cmake/hip/hip-config.cmake:183 (find_dependency)
  library/CMakeLists.txt:34 (find_package)

Operating System

SLES 15-SP4

CPU

AMD EPYC 7713 64-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 5.5.1

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

FYI - I did not find the right GPU option in the selection, so I selected randomly in order to be able to submit the issue.

evetsso commented 5 months ago

If you set HIP_PLATFORM=nvidia in the environment, does that make a difference?

rgayatri23 commented 5 months ago

It's already set and it did not make any difference. It's usually set in our environment whenever hip-rocm modules are loaded.

evetsso commented 5 months ago

Hmm, can you try commenting out the find_package(HIP REQUIRED) on library/CMakeLists.txt:34? Now that I look, it doesn't seem like it should be necessary.

af-ayala commented 5 months ago

@rgayatri23 Please try this: module purge module load cuda hip-cuda boost cmake fftw export HIP_PLATFORM=nvidia cmake -DROCM_DIR= -DCMAKE_MODULE_PATH=/hip/cmake/ -DCMAKE_CXX_COMPILER=hipcc -DHIP_ROOT_DIR= -DBUILD_WITH_LIB=CUDA -DBUILD_CLIENTS=ON -DCMAKE_CXX_FLAGS="-gencode=arch=compute_80,code=sm_80" ..

rgayatri23 commented 5 months ago

Thanks @af-ayala . This time the build went a bit ahead but got blocked on a different issue, so partial success! CMake is unable to find FFTW, even though its definitely in the path

-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11.0")
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find FFTW (missing: FFTW_INCLUDE_DIRS FFTW_LIBRARIES) (Required
  is at least version "3.0")
Call Stack (most recent call first):
  /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  clients/cmake/FindFFTW.cmake:103 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  clients/tests/CMakeLists.txt:26 (find_package)

-- Configuring incomplete, errors occurred!
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeOutput.log".
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeError.log".
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> echo $CPATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include:/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/include
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> ls /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/*cufft*
.rw-r--r-- 12k root 29 Sep  2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufft.h
.rw-r--r-- 19k root 29 Sep  2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftw.h
.rw-r--r-- 12k root 29 Sep  2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftXt.h

/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftmp:
.rw-r--r-- 4.1k root 29 Sep  2023 cudalibxt.h
.rw-r--r--  12k root 29 Sep  2023 cufft.h
.rw-r--r-- 5.1k root 29 Sep  2023 cufftMp.h
.rw-r--r--  19k root 29 Sep  2023 cufftw.h
.rw-r--r--  12k root 29 Sep  2023 cufftXt.h
evetsso commented 5 months ago

Did you build FFTW yourself, or are you using the SLES packages? The distro packages are easier to use since they include both single and double precision libraries.

rgayatri23 commented 5 months ago

The GPU softwares are all built through the distro packages.

af-ayala commented 5 months ago

If you just want to build the library, setting -DBUILD_CLIENTS=OFF will get you that. Sometimes using modules from supercomputers becomes tricky. To build our testing infrastructure with DBUILD_CLIENTS=ON, you indeed need the dependencies for which you're getting errors, I would suggest the following procedure that works for me on other clusters:

rgayatri23 commented 5 months ago

Even with the BUILD_CLIENTS=OFF, CMake is looking for cufft. Is there a CMake var to pass the path. I did everything from adding the path to CMAKE_PREFIX_PATH to passing it as CXX and linker flags but it looks like the path is not being picked up.

ppanchad-amd commented 1 month ago

@rgayatri23 Can you please check if you are still seeing the issue with the latest ROCm 6.1.2? Thanks!

malcolmroberts commented 2 days ago

This has been stale for a while; closing for now. Feel free to re-open if there's still a problem!

rgayatri23 commented 2 days ago

Sure. Sorry about the delay. I am having issues building rocm/6.0 on the NVIDIA platform. I will test this again once that is done.