Open yizeyi18 opened 7 months ago
@yizeyi18 Maybe FindHIP
should be executed first.
@yizeyi18 Maybe
FindHIP
should be executed first.
@caic99 amd rocm provides one FindHIP, but does not contain this macro. ABACUS have no script that defines this macro either.
@denghuilu Would you take a look? Thanks.
I can compile normally in the rocm-5.3 environment and under sugon DTK's environment. I suspect that newer versions of rocm have made some changes in the use of cmake, which requires further research, or something else happened during the configuration. Also, it is recommended to use rocm's own clang compiler for compilation.
[root@813455ba0e57:abacus-develop]$ git log | head
commit 089c0bfdf68288b0243eea183997b7f5d0c92d0a
Author: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com>
Date: Thu Feb 29 15:25:31 2024 +0800
Test: add a new UnitTest to protect `save_DMR()` (#3659)
* add a new UnitTest to protect save_DMR()
* delete kv
[root@813455ba0e57:abacus-develop]$ cd build/
[root@813455ba0e57:build]$ cmake -DCMAKE_CXX_COMPILER=clang++ -DUSE_ROCM=ON -DENABLE_DEEPKS=OFF -DENABLE_LIBXC=ON -DBUILD_TESTING=ON -DCOMMIT_INFO=OFF -DCMAKE_BUILD_TYPE=Release ..
-- The CXX compiler identification is Clang 15.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm-5.3.0/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Cereal: /usr/include
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")
-- Found ELPA: /usr/lib/x86_64-linux-gnu/libelpa.so
-- Performing Test ELPA_VERSION_SATISFIES
-- Performing Test ELPA_VERSION_SATISFIES - Success
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - NOTFOUND
-- Found HIP: /opt/rocm-5.3.0/hip (found version "5.3.22061-e8e78f1a")
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- Found HIP: 5.3.22061-e8e78f1a
-- Found FFTW3: /usr/lib/x86_64-linux-gnu/libfftw3_omp.so
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Found ScaLAPACK: /usr/lib/x86_64-linux-gnu/libscalapack-openmpi.so
-- Found Libxc: version 5.1.7
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- Configuring done
-- Generating done
-- Build files have been written to: /workspaces/abacus-develop/build
@caic99 We may need to review our rocm compilation according to this doc: https://rocm.docs.amd.com/en/latest/conceptual/cmake-packages.html.
@yizeyi18 Could you further check the compilation with the DTK environment? If you encounter any problems, we are always ready to discuss them.
@denghuiliu cmake passed under dtk environment, but compile failed with error below. I guess it's because I use gcc but not clang from gtk?
```shell make -f source/CMakeFiles/device.dir/build.make source/CMakeFiles/device.dir/build make[2]: Entering directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' [ 94%] Building CXX object source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o cd /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb/source && /public/home/acx1h0pf1e/soft/spack/lib/spack/env/gcc/g++ -DMETIS -DUSE_CEREAL_SERIALIZATION -DUSE_LIBXC -DUSE_NEW_TWO_CENTER -D__ELPA -D__FFTW3 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -D__LCAO -D__MPI -D__ROCM -D__SELINV -D__UT_USE_ROCM -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_base/module_container -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/cereal-1.3.2-7bxanb23tswiw2ugwi62kpzplrmv5hv3/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/elpa-2023.05.001-um6v6tnvbcwhiloon6exvgcptlh4kmmc/include/elpa_openmp-2023.05.001 -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/openmpi-5.0.1-s2vejdl5rg7xk27qheos4omeyp4ev2s7/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/fftw-3.3.10-kclpwnce2lfi6uxcdfk5uqtg3u5uvnxq/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/libxc-6.2.2-xgdwmrdl5l7j2waetvvioih6xanzbt73/include -isystem /public/software/compiler/rocm/dtk-23.10/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/../include -isystem /public/software/compiler/rocm/dtk-23.10/llvm/lib/clang/15.0.0/include/.. -fopenmp -O3 -DNDEBUG -xhip --cuda-gpu-arch=gfx906 --cuda-gpu-arch=gfx926 -std=gnu++11 -MD -MT source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -MF CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o.d -o CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -c /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx906' make[2]: Leaving directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx926' [ 94%] Built target operator_ks_lcao make[2]: *** [source/CMakeFiles/device.dir/build.make:79: source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o] Error 1 ```
@denghuiliu cmake passed under dtk environment, but compile failed with error below. I guess it's because I use gcc but not clang from gtk?
Error message
make -f source/CMakeFiles/device.dir/build.make source/CMakeFiles/device.dir/build make[2]: Entering directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' [ 94%] Building CXX object source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o cd /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb/source && /public/home/acx1h0pf1e/soft/spack/lib/spack/env/gcc/g++ -DMETIS -DUSE_CEREAL_SERIALIZATION -DUSE_LIBXC -DUSE_NEW_TWO_CENTER -D__ELPA -D__FFTW3 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__ -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -D__LCAO -D__MPI -D__ROCM -D__SELINV -D__UT_USE_ROCM -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source -I/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_base/module_container -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/cereal-1.3.2-7bxanb23tswiw2ugwi62kpzplrmv5hv3/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/elpa-2023.05.001-um6v6tnvbcwhiloon6exvgcptlh4kmmc/include/elpa_openmp-2023.05.001 -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/openmpi-5.0.1-s2vejdl5rg7xk27qheos4omeyp4ev2s7/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/fftw-3.3.10-kclpwnce2lfi6uxcdfk5uqtg3u5uvnxq/include -I/public/home/acx1h0pf1e/soft/spack/opt/spack/linux-centos7-x86_64_v3/gcc-13.2.0/libxc-6.2.2-xgdwmrdl5l7j2waetvvioih6xanzbt73/include -isystem /public/software/compiler/rocm/dtk-23.10/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/include -isystem /public/software/compiler/rocm/dtk-23.10/hip/../include -isystem /public/software/compiler/rocm/dtk-23.10/llvm/lib/clang/15.0.0/include/.. -fopenmp -O3 -DNDEBUG -xhip --cuda-gpu-arch=gfx906 --cuda-gpu-arch=gfx926 -std=gnu++11 -MD -MT source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -MF CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o.d -o CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o -c /tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-src/source/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx906' make[2]: Leaving directory '/tmp/acx1h0pf1e/spack-stage/spack-stage-abacus-3.5.3-4tqmlpbpiqx4hnw2rardy3xk2xotu7io/spack-build-4tqmlpb' g++: error: unrecognized command-line option '--cuda-gpu-arch=gfx926' [ 94%] Built target operator_ks_lcao make[2]: *** [source/CMakeFiles/device.dir/build.make:79: source/CMakeFiles/device.dir/module_hamilt_pw/hamilt_pwdft/kernels/nonlocal_op.cpp.o] Error 1
Use the clang++ compiler instead of gnu g++
@denghuilu I faced some problem configuring with clang++: CMake do not consider my mpi library usable, I'm afraid I need to recompile an OpenMPI with clang too, it really takes some time.
@denghuilu I faced some problem configuring with clang++: CMake do not consider my mpi library usable, I'm afraid I need to recompile an OpenMPI with clang too, it really takes some time.
Check this link, I think it can be helpful: https://mcresearch.gitee.io/abacus-user-guide/abacus-dcu.html
Check this link, I think it can be helpful: https://mcresearch.gitee.io/abacus-user-guide/abacus-dcu.html
Thanks! This really helps. Although I still uses a manual-build openmp, as I'm trying to implement a spack-build doc(see #3291), this document contains a bunch of useful information, e. g. it needs single- and double-precision fftw to work with rocm; also to notice that lcao is not recommended with rocm.
@denghuilu I made a successful build under dtk 23.10 environment with spack! So hard, but really shows the cmake script works under dtk environment. Spack install script updated and pushed to online repository. Some difficulties are:
@denghuilu I made a successful build under dtk 23.10 environment with spack! So hard, but really shows the cmake script works under dtk environment. Spack install script updated and pushed to online repository. Some difficulties are:
- dtk compiler uses a non-compressing ld, but dependent libs(like fftw) are built using compiler with compressing ld, that causes link error. This is not solved in the script; only can we expect users do not use a non-system ld to build dependencies.
- dtk cmake script takes some improper assumption, causing cmake error. Solved in spack script.
- select a correct compiler. Generally cpu part of the program need not to be built with gpu compiler, and user with no knowledge of ABACUS may specify some other compiler in command line; for ABACUS there is no choice to seperate them, so rocm compiler must be specified explicitly. Specified in spack script.
- not tested. I have no card time on Sugon platform, do you have that condition to test this?
@caic99 @pxlxingliang Do we have some ways to help developers to access the Sugon test environment?
@caic99 @pxlxingliang Do we have some ways to help developers to access the Sugon test environment?
@denghuilu I'm afraid not.
@yizeyi18 can we close this issue now, or keep it open?
@yizeyi18 can we close this issue now, or keep it open?
@WHUweiqingzhou I had been long not focusing this. Tonight I could test installing ABACUS under rocm environment and decide... As the spack package is still not merged into their repository.
Describe the bug
When compiling ABACUS with rocm-5.7.0 from https://repo.radeon.com/rocm/apt/5.7/, CMake says:
This macro is also not avilable in rocm-6.0.0.
Expected behavior
There should be no error.
To Reproduce
1.clone the source code from https://github.com/deepmodeling/abacus-develop 2.configure CMake with options
-DUSE_ROCM:BOOL=ON -DCOMMIT_INFO:BOOL=OFF -DCMAKE_PREFIX_PATH="${ROCM_PATH}"
then the error above encounters.
Environment
env1: pc using rocm-5.7.0
env2: sugon platform 计算服务平台华东一区【昆山】with dtk-23.10
Additional Context
Install under dtk-23.10 is under test; test by
grep -r "hip_add_library" /public/software/compiler/rocm/dtk-23.10
shows the macro is not defined in dtk-23.10. Is this macro abacus-defined or any version of dcu toolkit specific?Task list for Issue attackers (only for developers)