deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
170 stars 129 forks source link

Compile: Unknown CMake command "hip_add_library" #3662

Open yizeyi18 opened 7 months ago

yizeyi18 commented 7 months ago

Describe the bug

When compiling ABACUS with rocm-5.7.0 from https://repo.radeon.com/rocm/apt/5.7/, CMake says:

-- Found Libxc: version 6.2.2
CMake Error at source/module_base/module_container/CMakeLists.txt:18 (hip_add_library):
  Unknown CMake command "hip_add_library".

-- Configuring incomplete, errors occurred!

This macro is also not avilable in rocm-6.0.0.

Expected behavior

There should be no error.

To Reproduce

1.clone the source code from https://github.com/deepmodeling/abacus-develop 2.configure CMake with options -DUSE_ROCM:BOOL=ON -DCOMMIT_INFO:BOOL=OFF -DCMAKE_PREFIX_PATH="${ROCM_PATH}"

then the error above encounters.

Environment

env1: pc using rocm-5.7.0

env2: sugon platform 计算服务平台华东一区【昆山】with dtk-23.10

Additional Context

Install under dtk-23.10 is under test; test by grep -r "hip_add_library" /public/software/compiler/rocm/dtk-23.10 shows the macro is not defined in dtk-23.10. Is this macro abacus-defined or any version of dcu toolkit specific?

Task list for Issue attackers (only for developers)

caic99 commented 7 months ago

@yizeyi18 Maybe FindHIP should be executed first.

yizeyi18 commented 7 months ago

@yizeyi18 Maybe FindHIP should be executed first.

@caic99 amd rocm provides one FindHIP, but does not contain this macro. ABACUS have no script that defines this macro either.

caic99 commented 7 months ago

@denghuilu Would you take a look? Thanks.

denghuilu commented 7 months ago

I can compile normally in the rocm-5.3 environment and under sugon DTK's environment. I suspect that newer versions of rocm have made some changes in the use of cmake, which requires further research, or something else happened during the configuration. Also, it is recommended to use rocm's own clang compiler for compilation.

[root@813455ba0e57:abacus-develop]$ git log | head 
commit 089c0bfdf68288b0243eea183997b7f5d0c92d0a
Author: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com>
Date:   Thu Feb 29 15:25:31 2024 +0800

    Test: add a new UnitTest to protect `save_DMR()` (#3659)

    * add a new UnitTest to protect save_DMR()

    * delete kv

[root@813455ba0e57:abacus-develop]$ cd build/
[root@813455ba0e57:build]$ cmake -DCMAKE_CXX_COMPILER=clang++ -DUSE_ROCM=ON -DENABLE_DEEPKS=OFF -DENABLE_LIBXC=ON -DBUILD_TESTING=ON -DCOMMIT_INFO=OFF -DCMAKE_BUILD_TYPE=Release .. 
-- The CXX compiler identification is Clang 15.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm-5.3.0/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Cereal: /usr/include  
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2") 
-- Found ELPA: /usr/lib/x86_64-linux-gnu/libelpa.so  
-- Performing Test ELPA_VERSION_SATISFIES
-- Performing Test ELPA_VERSION_SATISFIES - Success
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - NOTFOUND
-- Found HIP: /opt/rocm-5.3.0/hip (found version "5.3.22061-e8e78f1a") 
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- Found HIP: 5.3.22061-e8e78f1a
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3_omp.so  
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so  
-- Found ScaLAPACK: /usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so  
-- Found Libxc: version 5.1.7
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter 
-- Configuring done
-- Generating done
-- Build files have been written to: /workspaces/abacus-develop/build
denghuilu commented 7 months ago

@caic99 We may need to review our rocm compilation according to this doc: https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html.

denghuilu commented 7 months ago

@yizeyi18 Could you further check the compilation with the DTK environment? If you encounter any problems, we are always ready to discuss them.

yizeyi18 commented 7 months ago

@denghuiliu cmake passed under dtk environment, but compile failed with error below. I guess it's because I use gcc but not clang from gtk?

Error message

```shell make -f source/CMakeFiles/device.dir/build.make source/CMakeFiles/device.dir/build make[2]: Entering directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' [ 94%] Building CXX object source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o cd /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb/source && /public/home/acx1h0pf1e/soft/spack/lib/spack/env/gcc/g++ -DMETIS -DUSE_CEREAL_SERIALIZATION -DUSE_LIBXC -DUSE_NEW_TWO_CENTER -D__ELPA -D__FFTW3 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -D__LCAO -D__MPI -D__ROCM -D__SELINV -D__UT_USE_ROCM -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_base/module_container -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/cereal-1.3.2-7bxanb23tswiw2ugwi62kpzplrmv5hv3/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/elpa-2023.05.001-um6v6tnvbcwhiloon6exvgcptlh4kmmc/include/elpa_openmp-2023.05.001 -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/openmpi-5.0.1-s2vejdl5rg7xk27qheos4omeyp4ev2s7/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/fftw-3.3.10-kclpwnce2lfi6uxcdfk5uqtg3u5uvnxq/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/libxc-6.2.2-xgdwmrdl5l7j2waetvvioih6xanzbt73/include -isystem /public/software/compiler/rocm/dtk-23.10/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/../include -isystem /public/software/compiler/rocm/dtk-23.10/llvm/lib/clang/15.0.0/include/.. -fopenmp -O3 -DNDEBUG -xhip --cuda-gpu-arch=gfx906 --cuda-gpu-arch=gfx926 -std=gnu++11 -MD -MT source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -MF CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o.d -o CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -c /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx906' make[2]: Leaving directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx926' [ 94%] Built target operator_ks_lcao make[2]: *** [source/CMakeFiles/device.dir/build.make:79: source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o] Error 1 ```

denghuilu commented 7 months ago

@denghuiliu cmake passed under dtk environment, but compile failed with error below. I guess it's because I use gcc but not clang from gtk?

Error message

make  -f source/CMakeFiles/device.dir/build.make source/CMakeFiles/device.dir/build
make[2]: Entering directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb'
[ 94%] Building CXX object source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o
cd /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb/source && /public/home/acx1h0pf1e/soft/spack/lib/spack/env/gcc/g++ -DMETIS -DUSE_CEREAL_SERIALIZATION -DUSE_LIBXC -DUSE_NEW_TWO_CENTER -D__ELPA -D__FFTW3 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -D__LCAO -D__MPI -D__ROCM -D__SELINV -D__UT_USE_ROCM -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_base/module_container -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/cereal-1.3.2-7bxanb23tswiw2ugwi62kpzplrmv5hv3/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/elpa-2023.05.001-um6v6tnvbcwhiloon6exvgcptlh4kmmc/include/elpa_openmp-2023.05.001 -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/openmpi-5.0.1-s2vejdl5rg7xk27qheos4omeyp4ev2s7/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/fftw-3.3.10-kclpwnce2lfi6uxcdfk5uqtg3u5uvnxq/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/libxc-6.2.2-xgdwmrdl5l7j2waetvvioih6xanzbt73/include -isystem /public/software/compiler/rocm/dtk-23.10/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/../include -isystem /public/software/compiler/rocm/dtk-23.10/llvm/lib/clang/15.0.0/include/.. -fopenmp -O3 -DNDEBUG -xhip --cuda-gpu-arch=gfx906 --cuda-gpu-arch=gfx926 -std=gnu++11 -MD -MT source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -MF CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o.d -o CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -c /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp
g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx906'
make[2]: Leaving directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb'
g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx926'
[ 94%] Built target operator_ks_lcao
make[2]: *** [source/CMakeFiles/device.dir/build.make:79: source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o] Error 1

Use the clang++ compiler instead of gnu g++

yizeyi18 commented 7 months ago

@denghuilu I faced some problem configuring with clang++: CMake do not consider my mpi library usable, I'm afraid I need to recompile an OpenMPI with clang too, it really takes some time.

denghuilu commented 7 months ago

@denghuilu I faced some problem configuring with clang++: CMake do not consider my mpi library usable, I'm afraid I need to recompile an OpenMPI with clang too, it really takes some time.

Check this link, I think it can be helpful: https://mcresearch.gitee.io/abacus-user-guide/abacus-dcu.html

yizeyi18 commented 7 months ago

Check this link, I think it can be helpful: https://mcresearch.gitee.io/abacus-user-guide/abacus-dcu.html

Thanks! This really helps. Although I still uses a manual-build openmp, as I'm trying to implement a spack-build doc(see #3291), this document contains a bunch of useful information, e. g. it needs single- and double-precision fftw to work with rocm; also to notice that lcao is not recommended with rocm.

yizeyi18 commented 7 months ago

@denghuilu I made a successful build under dtk 23.10 environment with spack! So hard, but really shows the cmake script works under dtk environment. Spack install script updated and pushed to online repository. Some difficulties are:

denghuilu commented 7 months ago

@denghuilu I made a successful build under dtk 23.10 environment with spack! So hard, but really shows the cmake script works under dtk environment. Spack install script updated and pushed to online repository. Some difficulties are:

  • dtk compiler uses a non-compressing ld, but dependent libs(like fftw) are built using compiler with compressing ld, that causes link error. This is not solved in the script; only can we expect users do not use a non-system ld to build dependencies.
  • dtk cmake script takes some improper assumption, causing cmake error. Solved in spack script.
  • select a correct compiler. Generally cpu part of the program need not to be built with gpu compiler, and user with no knowledge of ABACUS may specify some other compiler in command line; for ABACUS there is no choice to seperate them, so rocm compiler must be specified explicitly. Specified in spack script.
  • not tested. I have no card time on Sugon platform, do you have that condition to test this?

@caic99 @pxlxingliang Do we have some ways to help developers to access the Sugon test environment?

caic99 commented 7 months ago

@caic99 @pxlxingliang Do we have some ways to help developers to access the Sugon test environment?

@denghuilu I'm afraid not.

WHUweiqingzhou commented 6 months ago

@yizeyi18 can we close this issue now, or keep it open?

yizeyi18 commented 6 months ago

@yizeyi18 can we close this issue now, or keep it open?

@WHUweiqingzhou I had been long not focusing this. Tonight I could test installing ABACUS under rocm environment and decide... As the spack package is still not merged into their repository.