deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
166 stars 129 forks source link

Abacus Installation from source with Intel #1372

Closed lucasem1 closed 1 year ago

lucasem1 commented 1 year ago

Details

Dear Developers I am trying to install ABACUS 3.0.0 by using the intel mpi compilers. I tested both 2018 and 2020 with the GCC10.3 support. The installation fails in the linking step: What am doing wrong? Hereafter, there is a part of what I get on the screen at the end of the compilation.


[ 99%] Linking CXX executable abacus /LIBS/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a(libelpa_public_la-elpa_api.o): In function elpa_api_mp_elpa_c_string_': manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x2d): undefined reference toc_f_pointer_set_scalar' /LIBS/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a(libelpa_public_la-elpa_api.o): In function elpa_api_mp_elpa_int_value_to_string_': manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0xae): undefined reference tofor_concat' manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x106): undefined reference to for_concat' manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x145): undefined reference toiso_c_binding_mp_c_associatedptr' manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x1bf): undefined reference to for_write_seq_fmt' manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x1de): undefined reference tofor_write_seq_fmt_xmit' manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x28b): undefined reference to for_concat' manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x2ad): undefined reference tofor_write_seq_fmt_xmit' manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x2c9): undefined reference to c_f_pointer_set_scalar' /LIBS/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a(libelpa_public_la-elpa_api.o): In functionelpa_api_mp_elpa_int_string_tovalue': manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x353): undefined reference to for_concat' manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x397): undefined reference tofor_concat' manuallypreprocessed.._src_elpaapi.F90-src.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x4b6): undefined reference to for_concat' manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x4f4): undefined reference tofor_write_seq_fmt' /LIBS/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a(libelpa_public_la-elpa_api.o): In function elpa_api_mp_elpa_option_cardinality_': manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x560): undefined reference tofor_concat' /LIBS/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a(libelpa_public_la-elpa_api.o): In function elpa_api_mp_elpa_option_enumerate_': manually_preprocessed_.._src_elpa_api.F90-src_.libs_libelpa_public_la-elpa_api.o.F90:(.text+0x5d5): undefined reference tofor_concat'....... .... ..... /intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_float4_add' /kosmos/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference tokmpc_atomic_float8_max' /kosmos/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `kmpc_atomic_fixed4_rd' /kosmos/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined reference to `__kmpc_reduce' collect2: error: ld returned 1 exit status make[2]: [abacus] Error 1 make[1]: [CMakeFiles/abacus.dir/all] Error 2 make: *** [all] Error 2

caic99 commented 1 year ago

Hi @lucasem1 , How did you compile ELPA? You should use the same tool chain (i.e. OneAPI) for ELPA with ABACUS.

lucasem1 commented 1 year ago

Hi caic99

here there is the head of my config.log for the elpa :

../configure --prefix=/LIBS/INTEL_2020/install/elpa-2021.05.002 FCFLAGS=-O3 -xCORE-AVX512 CFLAGS=-O3 -xCORE-AVX512 --enable-option-checking=fatal SCALAPACK_LDFLAGS=-L/exe_kosmos/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread SCALAPACK_FCFLAGS=-I/exe_kosmos/intel/compilers_and_libraries_2020.2.254/linux/mkl/include/intel64/lp64 --enable-avx2 --enable-avx512 FC=mpiifort CC=mpiicc

Do I need to recompile? If yes, how?

Thanks Luca Sementa

caic99 commented 1 year ago

Hi @lucasem1 , Would you try CC=mpiicc CXX=mpiicpc FC=mpiifort ../configure FCFLAGS="-mkl=cluster"? IMHO the ifort compiler seems to have extra support to mkl with -mkl flag, so we don't need to pass MKL as a ScaLAPACK implementation. Wiki FYI. Installing ELPA is quite tricky; we are planning to make it optional.

lucasem1 commented 1 year ago

Dear caic99

I did what you said without success. Do you have any other suggestions? The issues always appear in the linking.

caic99 commented 1 year ago

Hi @lucasem1 , Would you share your building command for ABACUS?

lucasem1 commented 1 year ago

export CC=mpiicc export CXX=mpiicpc export FC=mpiifort cmake -DCMAKE_INSTALL_PREFIX=/home/.../CODES/INTEL/INTEL_2020/ABACUS/abacus300\ -DCEREAL_INCLUDE_DIR=/home/.../CODES/INTEL/INTEL_2020/ABACUS/cereal/include\ -DELPA_LIBRARY=/home/.../LIB/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a\ -DELPA_INCLUDE_DIR=/home/.../LIBS/INTEL_2020/install/elpa-2021.05.002/include/elpa-2021.05.002 ..

caic99 commented 1 year ago

@lucasem1 Would you try replacing -DELPA_LIBRARY=/home/.../LIB/INTEL_2020/install/elpa-2021.05.002/lib/libelpa.a with -DELPA_DIR=/home/.../LIB/INTEL_2020/install/elpa-2021.05.002/?

lucasem1 commented 1 year ago

It gets stuck at the config stage:

-- The CXX compiler identification is Intel 19.1.2.20200623 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /exe_kosmos/intel/compilers_and_libraries_2020.2.254/linux/mpi/intel64/bin/mpiicpc - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Cereal: /home/.../CODES/INTEL/INTEL_2020/ABACUS/cereal/include
CMake Error at /home.../LIBS/GCC_10.3/install/cmake3232/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find ELPA (missing: ELPA_LIBRARY) Call Stack (most recent call first): /home/.../LIBS/GCC_10.3/install/cmake3232/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE) modules/FindELPA.cmake:24 (find_package_handle_standard_args) CMakeLists.txt:58 (find_package)

caic99 commented 1 year ago

@lucasem1 We didn't try to use static library libelpa.a. Would you replace it in your original script with libelpa.so? Currently we are working on conda distribution. We will notify you as long as installing by conda is ready.

lucasem1 commented 1 year ago

Good....dynamic linking is ok. Thank you for the advice. I hope you will continue keeping ELPA as the preferred library for diagonalization. I can confirm that when using it with Intel compilers and mkl one can get a huge speed-up.

Just another question. Is it possible to link abacus to the ELPA library compiled with the OPENMP support? (libelpa_openmp.so)

caic99 commented 1 year ago

@lucasem1 Feel free to reach us if you encountered further problems.

I can confirm that when using it with Intel compilers and mkl one can get a huge speed-up.

How's the speed up? I have been told that the latest version of MKL actually uses the algorithm of ELPA, yielding a consistent speed, but I have not tried it myself.

Is it possible to link abacus to the ELPA library compiled with the OPENMP support?

ELPA does not have CMake support, so we have to implement it ourselves. Please change this line indicating library name https://github.com/deepmodeling/abacus-develop/blob/17796957b8676f0dddc6c25a6fd0747c4e16c70e/modules/FindELPA.cmake#L16 to elpa_openmp.

lucasem1 commented 1 year ago

This is what I get from my tests on the P105_si512_lcao system on 64 Xeon procs:

SCALAPACK_GVX ITER ETOT(eV) EDIFF(eV) DRHO TIME(s) GV1 -5.484310e+04 0.000000e+00 1.588e-01 8.266e+01 GV2 -5.482309e+04 2.000490e+01 1.054e-01 6.376e+01 GV3 -5.482686e+04 -3.768838e+00 8.453e-03 5.912e+01

GENELPA ITER ETOT(eV) EDIFF(eV) DRHO TIME(s) GE1 -5.484310e+04 0.000000e+00 1.588e-01 4.584e+01 GE2 -5.482309e+04 2.000490e+01 1.054e-01 4.553e+01 GE3 -5.482686e+04 -3.768838e+00 8.453e-03 4.569e+01

the speed-up is strongly reduced on 144 procs. Unfortunately, ELPA does not show performance improvement when using OMP_NUM_THREADS=2 (is it possible to fix this?) , at variance with SCALAPACK_GVX whose performance improves of about 10%.

Did you plan to link the code to the ELSI library too? This way one can exploit other algorithms like PEXSI, SLEPC-Sips, BSEPACK, EIGENEXA and so on.

On 144 procs I get this warning: Grid_Technique::init_atoms_on_grid warning : No atom on this sub-FFT-mesh Does this unbalanced loading affect the performance? If yes, how can I fix this?

caic99 commented 1 year ago

@lucasem1 Thank you for your sharing!

Unfortunately, ELPA does not show performance improvement when using OMP_NUM_THREADS=2 (is it possible to fix this?)

Is hyperthreading enabled on your platform? It may not further improve performance.

Did you plan to link the code to the ELSI library too?

Yes, but maybe not so fast. Currently we are evaluating the workload and our scheduling.

Does this unbalanced loading affect the performance?

@wenfei-li Could you take a look at this?

lucasem1 commented 1 year ago

Yes, hyperthreading is on (I got 10% speedup with SCALAPACK_GVX)

@wenfei-li Could you take a look at this? The link points to a web page containing many things. Could you please be a bit more precise about where I have to look?

wenfei-li commented 1 year ago

The unbalanced workload definitely affects the performance.

In grid integration, we parallelize the grid points along z axis, so if your system is not uniformly distributed along z axis, the workload will be unbalanced.

There is not easy fix for this problem, unless you want to rewrite the parallelization scheme for grid integration.