Closed rgayatri23 closed 2 days ago
If you set HIP_PLATFORM=nvidia
in the environment, does that make a difference?
It's already set and it did not make any difference. It's usually set in our environment whenever hip-rocm modules are loaded.
Hmm, can you try commenting out the find_package(HIP REQUIRED)
on library/CMakeLists.txt:34? Now that I look, it doesn't seem like it should be necessary.
@rgayatri23
Please try this:
module purge
module load cuda hip-cuda boost cmake fftw
export HIP_PLATFORM=nvidia
cmake -DROCM_DIR=
Thanks @af-ayala . This time the build went a bit ahead but got blocked on a different issue, so partial success! CMake is unable to find FFTW, even though its definitely in the path
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11.0")
CMake Error at /global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find FFTW (missing: FFTW_INCLUDE_DIRS FFTW_LIBRARIES) (Required
is at least version "3.0")
Call Stack (most recent call first):
/global/u1/r/rgayatri/.local/cmake/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
clients/cmake/FindFFTW.cmake:103 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
clients/tests/CMakeLists.txt:26 (find_package)
-- Configuring incomplete, errors occurred!
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeOutput.log".
See also "/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build/CMakeFiles/CMakeError.log".
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> echo $CPATH
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include:/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/include
rgayatri@perlmutter:login40:/pscratch/sd/r/rgayatri/HIP-LZ/hipFFT/build> ls /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/*cufft*
.rw-r--r-- 12k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufft.h
.rw-r--r-- 19k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftw.h
.rw-r--r-- 12k root 29 Sep 2023 /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftXt.h
/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/12.2/include/cufftmp:
.rw-r--r-- 4.1k root 29 Sep 2023 cudalibxt.h
.rw-r--r-- 12k root 29 Sep 2023 cufft.h
.rw-r--r-- 5.1k root 29 Sep 2023 cufftMp.h
.rw-r--r-- 19k root 29 Sep 2023 cufftw.h
.rw-r--r-- 12k root 29 Sep 2023 cufftXt.h
Did you build FFTW yourself, or are you using the SLES packages? The distro packages are easier to use since they include both single and double precision libraries.
The GPU softwares are all built through the distro packages.
If you just want to build the library, setting -DBUILD_CLIENTS=OFF will get you that. Sometimes using modules from supercomputers becomes tricky. To build our testing infrastructure with DBUILD_CLIENTS=ON, you indeed need the dependencies for which you're getting errors, I would suggest the following procedure that works for me on other clusters:
Even with the BUILD_CLIENTS=OFF
, CMake is looking for cufft. Is there a CMake var to pass the path. I did everything from adding the path to CMAKE_PREFIX_PATH
to passing it as CXX and linker flags but it looks like the path is not being picked up.
@rgayatri23 Can you please check if you are still seeing the issue with the latest ROCm 6.1.2? Thanks!
This has been stale for a while; closing for now. Feel free to re-open if there's still a problem!
Sure. Sorry about the delay. I am having issues building rocm/6.0 on the NVIDIA platform. I will test this again once that is done.
Problem Description
I am trying to build hipfft/rocm-5.5.1 on NVIDIA A100 GPUs available on the Perlmutter supercomputer. I already have cuda/12.2 and the corresponding cuFFT in my path. There is also hipcc/5.5.1 that is configured with the said cuda version. Here is the CMake Command:
The error
Operating System
SLES 15-SP4
CPU
AMD EPYC 7713 64-Core Processor
GPU
AMD Instinct MI250X
ROCm Version
ROCm 5.5.1
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
FYI - I did not find the right GPU option in the selection, so I selected randomly in order to be able to submit the issue.