Open iduygnay opened 3 months ago
@iduygnay Can you try exx_separate_loop=1
@PeizeLin Any comments?
exx_separate_loop=1 will be oom, and it is also slower than qe.
@iduygnay I notice that you're using ABACUS 3.7.0, can you update to the newest ABACUS with LibRI 0.2.0 ?
@iduygnay LibRI-v0.2.0 has improved efficiency many times compared to LibRI-v0.1.1. And ABACUS-v3.7.4 begins to support LibRI-v0.2.0 officially. So it's strongly recommended to update both of the codes.
LibRI-0.2.0 exactly has a higher efficiency than 0.1.1, but is still slower than QE. (the same calculation, QE completed in 7h)
@iduygnay What's your dependencies of ABACUS and QE? And What's parallel setting do you use (the parallel number of MPI and OpenMP) in HSE of ABACUS and QE?
Also, What's your ABACUS INPUT ? (I've no idea about QE orz)
ABACUS Runscript ABACUS INPUT (I also tried separate_loop=1, but not converge and had almost the same runtime as the case without that line) QE input QE Runscript
@iduygnay
@PeizeLin Other suggestions ?
Multi-threading has a significant impact on memory and speed for exx
export OMP_NUM_THREADS=64
mpirun -np 1 abacus
Slightly reduce the accuracy
exx_dm_threshold 1E-3
exx_ccp_rmesh_times 1
Using double in exx instead of complex
exx_real_number 1
I've tried export OMP_NUM_THREADS=64 mpirun -np 1 abacus the speed can be increased and the runtime is about 24 hours now. But 6.5 hours for QE. I can also try the other parameters soon.
I tried exx_ccp_rmesh_times 1 exx_real_number 1 Now, it takes 20 hours.
Details
The same structure and same accuracy input, this is the time that qe spent: This is ABACUS with exx_separate_loop = 0, if I comment this line, the calculation will be interrupted because of oom:
The ABACUS version is 3.7.0, this is the runscript:
Task list for Issue attackers (only for developers)