deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
163 stars 128 forks source link

Speed is too slow in `3.7.5 with icpx` than `3.6.5 with icpc` for both PBE and EXX calculations #5103

Open xdzhu opened 3 days ago

xdzhu commented 3 days ago

Details

Recently, I perform SOC + EXX calculation. You can check the INPUT and output files in hse-3.6vs3.7-lowerspeed.zip

When I choose 3.6.5 version to calculate, the speed is OK. Evey PBE step costs 13s and EXX costs 178s. Although it faces the slower PBE speed between every EXX step. image

When I change to 3.7.5, speed is very slow. Evey PBE step costs 43s and EXX costs 270s, which is twice than the 3.6.5 version above. image

Task list for Issue attackers (only for developers)

xdzhu commented 3 days ago

EVEN in nspin=1 case, 3.7.5 also faces a big backstep of speed than 3.6.5

As you can see, 3.7.5 is: image

when 3.6.5 gives: image

xdzhu commented 3 days ago

When I set ks_solver scalapack_gvx instead of genelpa, the slow speed still remains:

3.7.5 image

3.6.5 image

QuantumMisaka commented 3 days ago

@xdzhu What're your ABACUS installation dependencies?

xdzhu commented 3 days ago

I compared the time cost of these two versions. It seems arised from ESolver_KS_LCAO - runner and HSolverLCAO - solve modules.

3.7.5与3.6.5时间对比测试.xlsx

image

xdzhu commented 3 days ago

@xdzhu What're your ABACUS installation dependencies?

Both with intel OneAPI 2023.1.0 and GCC 13.1.0.

3.6.5 with LibRI_0.1.0_loop3 3.7.5 with LibRI_0.2.0

xdzhu commented 3 days ago

I have noticed that in 3.7.x version i take the icpx and mpicxx compilers instead of icpc and mpiicpc which I use to compile 3.6.5 version.

When I change the CXX and MPI_CXX to icpc and mpiicpc and recompile the 3.7.5 version, it goes faster than icpx case and the peformance is also nearly same with the 3.6.5 version.

3.7.5 with icpc image

3.7.5 with icpx image

3.6.5 with icpc image

QuantumMisaka commented 2 days ago

@xdzhu What're your hardware setting?

xdzhu commented 2 days ago

@QuantumMisaka The calculation node hardware is with Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (2*20C), 40 cores, and I run ABACUS with following command: mpirun -np 10 -genv OMP_THREADS_NUM=4 abacus