deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
170 stars 129 forks source link

MPI Error on running (oneAPI toolkits) #4665

Closed Cstandardlib closed 3 months ago

Cstandardlib commented 3 months ago

Details

When I install latest develop with oneAPI 2024.2, I failed to run examples/scf/lcao_Cu with errors:

 << Start SCF iteration.
Abort(537469699) on node 0 (rank 0 in comm 0): Fatal error in internal_Bcast: Unknown error class, error stack:
internal_Bcast(4152): MPI_Bcast(buffer=0x3bb8020, count=9, INVALID DATATYPE, 1, comm=0xc400001d) failed
internal_Bcast(4112): Invalid datatype
Abort(67707651) on node 1 (rank 1 in comm 0): Fatal error in internal_Bcast: Unknown error class, error stack:
internal_Bcast(4152): MPI_Bcast(buffer=0x2523580, count=9, INVALID DATATYPE, 1, comm=0xc4000015) failed
internal_Bcast(4112): Invalid datatype
Abort(470360835) on node 2 (rank 2 in comm 0): Fatal error in internal_Bcast: Unknown error class, error stack:
internal_Bcast(4152): MPI_Bcast(buffer=0x26923b0, count=1, INVALID DATATYPE, 1, comm=0xc4000014) failed
internal_Bcast(4112): Invalid datatype

I notice from Installation Guide that

We recommend Intel® oneAPI toolkit (former Intel® Parallel Studio) as toolchain. The Intel® oneAPI Base Toolkit contains Intel® oneAPI Math Kernel Library (aka MKL), including BLAS, LAPACK, ScaLAPACK and FFTW3. The Intel® oneAPI HPC Toolkit contains Intel® MPI Library, and C++ compiler(including MPI compiler). Please note that building elpa with a different MPI library may cause conflict. Don’t forget to set environment variables before you start! cmake will use Intel MKL if the environment variable MKLROOT is set.

and I got a warning while building:

/usr/bin/ld: warning: libmpi.so.40, needed by /usr/lib/x86_64-linux-gnu/libelpa.so, may conflict with libmpi.so.12

I'm currently building with libelpa-dev on Ubuntu 22.04.4 LTS. Does it mean that I have to also build elpa from source using the same version of oneAPI to run abacus properly with oneAPI compilation?

Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html

Task list for Issue attackers (only for developers)

caic99 commented 3 months ago

Does it mean that I have to also build elpa from source using the same version of oneAPI to run abacus properly with oneAPI compilation?

Yes. Please see https://abacus.deepmodeling.com/en/latest/quick_start/easy_install.html#install-requirements