Closed QuantumMisaka closed 2 weeks ago
Added: PBE0 and ABACUS 3.7.4 :Commit: 93badfa87 (Wed Aug 28 10:18:22 2024 +0800) will have the same problem
On my machine it will be killed at cal_datas
in cal_Cs_dCs
(the same happens with cal_force 1
and cal_stress 0
). It may be either OOM or some bug in cal_Cs_dCs
.
@PeizeLin it may need your help. For your information:
==> Exx_LRI::cal_exx_ions 51 GB 196 s
==> LRI_CV::cal_Vs 51 GB 196 s
==> LRI_CV::cal_datas 51 GB 196 s
==> LRI_CV::cal_dVs 48 GB 208 s
==> LRI_CV::cal_datas 48 GB 208 s
==> LRI_CV::cal_Cs_dCs 12.3 GB 340 s
==> LRI_CV::cal_datas 12.3 GB 340 s
cal_force
and cal_stress
are set to 0, it will run normally.@maki49 @PeizeLin In my machine, same error will occor even set cal_force 0
and cal_stress 0
FYI: in running_scf.log
SETUP SEARCHING RADIUS FOR PROGRAM TO SEARCH ADJACENT ATOMS
longest orb rcut (Bohr) = 8
longest nonlocal projector rcut (Bohr) = 2.16
searching radius is (Bohr)) = 20.3
searching radius unit is (Bohr)) = 1.89
SETUP EXTENDED REAL SPACE GRID FOR GRID INTEGRATION
real space grid = [ 80, 80, 72 ]
big cell numbers in grid = [ 16, 16, 24 ]
meshcell numbers in big cell = [ 5, 5, 3 ]
extended fft grid = [ 11, 11, 18 ]
dimension of extened grid = [ 39, 39, 61 ]
UnitCellTotal = 27
Atom number in sub-FFT-grid = 24
Local orbitals number in sub-FFT-grid = 536
ParaV.nnr = 1343892
nnrg = 2838200
Warning_Memory_Consuming allocated: Gint::hRGint 21.9 MB
Warning_Memory_Consuming allocated: Gint::DMRGint 43.8 MB
Warning_Memory_Consuming allocated: pvpR_reduced 43.3 MB
I've done some test and found that this bug emerged in somewhere between Commit: a33935612 (Thu Jun 27 16:40:42 2024 +0800) and Commit: 58126a8e6 (Mon Jul 15 21:45:33 2024 +0800)
I used LibRI-loop3 (in gitee) and LibComm 0.1.1 in these version
@maki49 I've done some test and confirm that Commit 740bf8e4ecd9f847751bcb000f89eb6367075d31 has no problem but commit 8a1f0125ae8714ab763efcb28a7f2f436e03e722 has problem, may you do some check?
Dependencies: LibRI: loop3 in gitee (early version) LibComm: 0.1.1 ELPA: 2024.03.001 Intel-OneAPI: 2023.0.0 Hardware: Intel 8358 64core 1024G mem
It is an out-of-memory.
In 8a1f012, about 0.2~0.3 GB more memory is allocated for H(R) in the constructor of OperatorEXX
, which leads to your problem.
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
commit | exx_real_number | free memory before `cal_Cs_dCs`(GB) | free memory after `cal_Cs_dCs`(GB) -- | -- | -- | -- 740bf8e | 0 | 36 | boom 740bf8e | 1 | 42.2 | 17.3 8a1f012 | 0 | 35.8 | boom 8a1f012 | 1 | 41.9 | 17.1
Describe the bug
When running LibRI HSE
exx_separate_loop 1
in some FeCx systems, there will be error and the calculation cannot be in runningAttachments: Fe2C-HSE-RI02.tar.gz
Expected behavior
HSE / PBE0 running normally
To Reproduce
in Attachments
Environment
Additional Context
@PeizeLin @maki49 Any advice ?
Task list for Issue attackers (only for developers)