Closed PeizeLin closed 1 day ago
Could you check old versions of ABACUS, such as 3.4, 3.5, 3.6, 3.7?
@PeizeLin PR #5508 fixed a serious memory leak in FFT, could you try again?
BTW, I tried to reproduce your issue, but failed. Could you debug your calculation with GDB?
ABACUS v3.8.3
Atomic-orbital Based Ab-initio Computation at UStc
Website: http://abacus.ustc.edu.cn/
Documentation: https://abacus.deepmodeling.com/
Repository: https://github.com/abacusmodeling/abacus-develop
https://github.com/deepmodeling/abacus-develop
Commit: 264915205 (Wed Nov 20 08:53:18 2024 +0800)
Wed Nov 20 13:40:33 2024
MAKE THE DIR : OUT.ABACUS/
RUNNING WITH DEVICE : CPU / Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
UNIFORM GRID DIM : 375 * 375 * 2160
UNIFORM GRID DIM(BIG) : 75 * 75 * 540
DONE(3.84129 SEC) : SETUP UNITCELL
DONE(5.239 SEC) : INIT K-POINTS
---------------------------------------------------------
Self-consistent calculations for electrons
---------------------------------------------------------
SPIN KPOINTS PROCESSORS THREADS NBASE
1 1 32 32 640
---------------------------------------------------------
Use Systematically Improvable Atomic bases
---------------------------------------------------------
ELEMENT ORBITALS NBASE NATOM XC
He 2s1p-6au 5 128
---------------------------------------------------------
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(7.58819 SEC) : INIT PLANEWAVE
-------------------------------------------
SELF-CONSISTENT :
-------------------------------------------
START CHARGE : atomic
DONE(136.831 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
GE1 -9.61443788e+03 0.00000000e+00 4.3042e-02 19.98
GE2 -9.61463134e+03 -1.93460842e-01 6.3655e-03 20.05
GE3 -9.61464143e+03 -1.00895202e-02 1.9757e-03 20.15
GE4 -9.61464588e+03 -4.45173080e-03 1.0249e-03 20.21
GE5 -9.61464973e+03 -3.85078496e-03 6.7370e-04 20.28
GE6 -9.61465098e+03 -1.25104934e-03 4.6435e-04 20.36
GE7 -9.61465159e+03 -6.07939182e-04 3.4548e-04 20.44
GE8 -9.61465184e+03 -2.47790231e-04 2.3762e-04 20.49
GE9 -9.61465198e+03 -1.42980558e-04 1.4803e-04 20.46
GE10 -9.61465203e+03 -4.75686398e-05 8.3014e-05 20.43
GE11 -9.61465207e+03 -3.84768551e-05 5.5287e-05 20.42
GE12 -9.61465207e+03 -5.74759178e-06 2.7248e-05 20.42
GE13 -9.61465207e+03 -7.90944256e-07 1.5309e-05 20.41
Due to my mistake, the memory in FFT was not properly released. I have corrected this part of the code in the new PR. Can it run now? Let me check.I don't have this issue on my machine. Could you please check if you can reproduce it?"
I have found the bug of this Issue: in file mixing_data.cpp
and line https://github.com/deepmodeling/abacus-develop/blob/develop/source/module_base/module_mixing/mixing_data.cpp#L28
if (ndim * length > 0)
{
this->data = malloc(ndim * length * type_size);
}
In this case of Issue, the variables would have values of
ndim = 8
length = 303750000
and the INT value of "ndim * length" is 2.43e9 > 2.147e9 (INT limit)
The solution is use size_t
rather than int
in this code.
Fixed by #5545
Describe the bug
128 He atoms PBE scf nspin=1 Interrupt in
Charge_Mixing::mix_rho
at the first step.1 MPI * 56 OpenMP Remaining memory is 194 GB
ABACUS version: 2024.11.13-dcff74dbeb
He_PBE_128_k1.zip
Expected behavior
No response
To Reproduce
No response
Environment
Additional Context
No response
Task list for Issue attackers (only for developers)