Closed QG-phy closed 1 year ago
OK, we need to detect whether the memory increase is normal or abnormal
The gnu machine is ok with the same type of machine. ABACUS v3.2
Atomic-orbital Based Ab-initio Computation at UStc
Website: http://abacus.ustc.edu.cn/
Documentation: https://abacus.deepmodeling.com/
Repository: https://github.com/abacusmodeling/abacus-develop
https://github.com/deepmodeling/abacus-develop
Tue Oct 31 12:41:43 2023 MAKE THE DIR : OUT.ABACUS/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Warning: the number of valence electrons in pseudopotential > 3 for In: [Kr] 4d10 5s2 5p1 Warning: the number of valence electrons in pseudopotential > 5 for Sb: [Kr] 4d10 5s2 5p3 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
START CHARGE : atomic
DONE(30.473 SEC) : INIT SCF
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 -1.639180e+05 0.000000e+00 3.989e-02 1.705e+02
GE2 -1.639183e+05 -3.836051e-01 1.641e-02 1.660e+02
GE3 -1.638702e+05 4.810953e+01 2.314e-02 1.662e+02
GE4 -1.638471e+05 2.315318e+01 4.421e-02 1.674e+02
GE5 -1.639208e+05 -7.370479e+01 1.282e-02 1.667e+02
GE6 -1.639277e+05 -6.927459e+00 8.503e-03 1.662e+02
GE7 -1.639287e+05 -9.728276e-01 4.888e-03 1.671e+02
GE8 -1.639284e+05 2.916989e-01 3.985e-03 1.669e+02
GE9 -1.639284e+05 2.072873e-02 3.086e-03 1.671e+02
GE10 -1.639285e+05 -9.234770e-02 2.066e-03 1.677e+02
GE11 -1.639284e+05 4.144906e-02 1.235e-03 1.674e+02
GE12 -1.639284e+05 -2.950202e-02 7.407e-04 1.667e+02
GE13 -1.639284e+05 8.972636e-03 8.118e-04 1.666e+02
GE14 -1.639285e+05 -1.773380e-02 4.452e-04 1.667e+02
GE15 -1.639284e+05 7.928516e-03 7.306e-04 1.672e+02
GE16 -1.639285e+05 -1.785576e-02 3.037e-04 1.669e+02
GE17 -1.639285e+05 -2.880893e-03 1.031e-04 1.665e+02
GE18 -1.639285e+05 -3.737139e-04 5.887e-05 1.667e+02
GE19 -1.639285e+05 -1.059564e-04 2.553e-05 1.664e+02
GE20 -1.639285e+05 -2.129738e-05 1.545e-05 1.665e+02
GE21 -1.639285e+05 -7.530288e-06 1.370e-05 1.665e+02
GE22 -1.639285e+05 -5.803601e-06 3.086e-06 1.667e+02
GE23 -1.639285e+05 5.566955e-07 5.313e-06 1.663e+02
GE24 -1.639285e+05 -1.096116e-06 1.022e-06 1.664e+02
GE25 -1.639285e+05 -3.274242e-08 1.269e-06 1.665e+02
GE26 -1.639285e+05 4.521573e-08 8.713e-07 1.668e+02
GE27 -1.639285e+05 -7.758693e-08 6.493e-07 1.664e+02
GE28 -1.639285e+05 -1.116163e-08 2.068e-07 1.665e+02
GE29 -1.639285e+05 2.029387e-09 2.055e-07 1.669e+02
GE30 -1.639285e+05 5.989166e-09 8.294e-08 2.711e+02
START Time : Tue Oct 31 12:41:43 2023 FINISH Time : Tue Oct 31 14:07:28 2023 TOTAL Time : 5145 SEE INFORMATION IN : OUT.ABACUS/
This case is OK with the command of OMP_NUM_THREADS=16 mpirun -np 1 abacus: GE17 -1.639285e+05 -2.880899e-03 1.031e-04 6.460e+01 GE18 -1.639285e+05 -3.737170e-04 5.887e-05 6.495e+01 GE19 -1.639285e+05 -1.059542e-04 2.553e-05 6.468e+01 GE20 -1.639285e+05 -2.129832e-05 1.545e-05 6.459e+01 GE21 -1.639285e+05 -7.527837e-06 1.370e-05 6.459e+01 GE22 -1.639285e+05 -5.803205e-06 3.086e-06 6.465e+01 GE23 -1.639285e+05 5.520180e-07 5.313e-06 6.484e+01 GE24 -1.639285e+05 -1.096834e-06 1.022e-06 6.489e+01 GE25 -1.639285e+05 -3.348488e-08 1.269e-06 6.491e+01 GE26 -1.639285e+05 4.462176e-08 8.713e-07 6.488e+01 GE27 -1.639285e+05 -7.974006e-08 6.493e-07 6.483e+01 GE28 -1.639285e+05 7.993804e-09 2.068e-07 6.481e+01 GE29 -1.639285e+05 -1.017168e-08 2.055e-07 6.525e+01 GE30 -1.639285e+05 1.056766e-08 8.294e-08 1.408e+02 START Time : Tue Oct 31 15:44:31 2023 FINISH Time : Tue Oct 31 16:18:22 2023
Hi @hongriTianqi , No memory leak at the end means all allocated is freed finally. However, this does not mean that the memory required for a single scf step is freed after this step. If that happenes, all memory is freed by the end, rather than at the end of each scf step. I would suggest monitoring if the memory growth as the scf step accumulates. Before that, please make sure this signal is sent by OOM killer.
@caic99 Thx, you are right.
The memory exceeded 124 G at the 16 scf step as I tested on an intel machine with the command OMP_NUM_THREADS=1 mpirun -np 16 abacus
. 124 G is the maximum memory on that machine as indicated by the htop command.
This is not a HSE example, it is a PBE example.
@hongriTianqi Hi! I test the example above, and use Grafana to give a graph of the amount of memory used over time,
The memory used during SCF iterations is increasing.
The debug info goes as follows:
==> DiagoElpa::diag 17.488 GB 24.3468 s
==> HamiltLCAO::updateHk 17.2276 GB 26.3595 s
==> OperatorLCAO::init 17.2276 GB 26.3595 s
==> OperatorLCAO::init 17.2276 GB 26.3616 s
==> OverlapNew::contributeHk 17.2276 GB 26.3616 s
==> OperatorLCAO::init 17.2276 GB 26.3656 s
==> OperatorLCAO::contributeHk 17.2276 GB 26.3656 s
==> HSolverLCAO::hamiltSolvePsiK 17.2276 GB 26.3704 s
==> DiagoElpa::diag 17.2276 GB 26.3704 s
==> HamiltLCAO::updateHk 17.0351 GB 28.3936 s
==> OperatorLCAO::init 17.0351 GB 28.3937 s
==> OperatorLCAO::init 17.0351 GB 28.3958 s
==> OverlapNew::contributeHk 17.0351 GB 28.3958 s
==> OperatorLCAO::init 17.0351 GB 28.3999 s
==> OperatorLCAO::contributeHk 17.0351 GB 28.3999 s
==> HSolverLCAO::hamiltSolvePsiK 17.0351 GB 28.4047 s
==> DiagoElpa::diag 17.0351 GB 28.4047 s
==> HamiltLCAO::updateHk 16.8386 GB 30.3966 s
==> OperatorLCAO::init 16.8386 GB 30.3966 s
==> OperatorLCAO::init 16.8386 GB 30.3988 s
==> OverlapNew::contributeHk 16.8386 GB 30.3988 s
==> OperatorLCAO::init 16.8386 GB 30.4027 s
==> OperatorLCAO::contributeHk 16.8386 GB 30.4027 s
==> HSolverLCAO::hamiltSolvePsiK 16.8386 GB 30.4076 s
==> DiagoElpa::diag 16.8386 GB 30.4076 s
==> HamiltLCAO::updateHk 16.6527 GB 32.3951 s
==> OperatorLCAO::init 16.6527 GB 32.3952 s
==> OperatorLCAO::init 16.6527 GB 32.3973 s
==> OverlapNew::contributeHk 16.6527 GB 32.3973 s
==> OperatorLCAO::init 16.6527 GB 32.4012 s
==> OperatorLCAO::contributeHk 16.6527 GB 32.4012 s
==> HSolverLCAO::hamiltSolvePsiK 16.6527 GB 32.4061 s
==> DiagoElpa::diag 16.6527 GB 32.4061 s
==> HamiltLCAO::updateHk 16.4619 GB 34.3882 s
==> OperatorLCAO::init 16.4619 GB 34.3883 s
==> OperatorLCAO::init 16.4619 GB 34.3902 s
==> OverlapNew::contributeHk 16.4619 GB 34.3902 s
==> OperatorLCAO::init 16.4619 GB 34.3941 s
==> OperatorLCAO::contributeHk 16.4619 GB 34.3941 s
==> HSolverLCAO::hamiltSolvePsiK 16.4616 GB 34.3989 s
==> DiagoElpa::diag 16.4616 GB 34.3989 s
==> HamiltLCAO::updateHk 16.2734 GB 36.3922 s
==> OperatorLCAO::init 16.2734 GB 36.3923 s
==> OperatorLCAO::init 16.2734 GB 36.3945 s
==> OverlapNew::contributeHk 16.2734 GB 36.3945 s
==> OperatorLCAO::init 16.2734 GB 36.3986 s
==> OperatorLCAO::contributeHk 16.2734 GB 36.3987 s
It is clear that the memory was consumed in every step of DiagoElpa::diag
. So now it is clear that this problem is caused by elpa interface. Scalapack does not cause the problem.
To update, the function es.generalized_eigenvector
inside DiagoElpa::diag
is consuming memory.
Look inside:
decomposeRightMatrix
: 0.06 GB (line 85)
Cpzgemm
: 0.1 GB (line 117)
eigenvector
: 0.05 GB (line 188)
inside decomposeRightMatrix
, elpa_cholesky
is consuming memory
some indices consumming memory?
elpa_invert_triangular
consumming memory
elpa_eigenvector
is consuming memory
We tried the newest elpa version of 2023.05.001, the same problem exits:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::82 17.1027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::90 17.1027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::233 17.1027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::286 17.1027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::294 17.1027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::297 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::303 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::306 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::339 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::346 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::361 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::369 17.0794 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::372 17.077 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::96 17.077 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::102 17.077 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::113 17.077 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::123 17.077 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::130 16.9772 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::139 16.9772 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::142 16.9759 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::193 16.9759 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::63 16.9759 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::66 16.625 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::197 16.625 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::208 16.625 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::217 16.625 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::82 16.6027 GB
==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::90 16.6027 GB
I analyzed the memory usage in a short period of time with Intel VTune.
The result shows that:
elpa_eigenvectors
, and Cpzgemm
in ELPA_Solver::generalized_eigenvector
(please note that multiple allocation happenes without deallocation)For the ELPA part, I would suggest trying out supported kernels and algorithms, and (hopefully) find a working one.
As for MKL, I recommend double-checking the interface provided in my_math.hpp
and its calling pattern to be correct.
Edit:
To reproduce, run ABACUS with MPI first, use htop
to find the PID of a random rank, and analyze info with:
vtune -collect memory-consumption -target-pid 7890
. After minutes of data collecting, it is OK to manually interrupt the vtune
process and stop the ABACUS program. Use vtune-backend
to visualize the result.
@QG-phy The calculation is successful with that latest intel image (abacus-intel:latest), please see the log file below:
ABACUS v3.4.3
Atomic-orbital Based Ab-initio Computation at UStc
Website: http://abacus.ustc.edu.cn/
Documentation: https://abacus.deepmodeling.com/
Repository: https://github.com/abacusmodeling/abacus-develop
https://github.com/deepmodeling/abacus-develop
Commit: baccbe3 (Fri Nov 17 22:22:05 2023 +0800)
Tue Nov 21 17:20:30 2023
MAKE THE DIR : OUT.ABACUS/
RUNNING WITH DEVICE : CPU / Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Warning: the number of valence electrons in pseudopotential > 3 for In: [Kr] 4d10 5s2 5p1
Warning: the number of valence electrons in pseudopotential > 5 for Sb: [Kr] 4d10 5s2 5p3
Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
UNIFORM GRID DIM : 180 * 200 * 180
UNIFORM GRID DIM(BIG) : 45 * 40 * 45
DONE(1.09052 SEC) : SETUP UNITCELL
DONE(1.17741 SEC) : SYMMETRY
DONE(1.27431 SEC) : INIT K-POINTS
---------------------------------------------------------
Self-consistent calculations for electrons
---------------------------------------------------------
SPIN KPOINTS PROCESSORS NBASE
1 36 16 2400
---------------------------------------------------------
Use Systematically Improvable Atomic bases
---------------------------------------------------------
ELEMENT ORBITALS NBASE NATOM XC
In 2s2p2d1f-7au 25 48
Sb 2s2p2d1f-7au 25 48
---------------------------------------------------------
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(1.38615 SEC) : INIT PLANEWAVE
-------------------------------------------
SELF-CONSISTENT :
-------------------------------------------
START CHARGE : atomic
DONE(9.4308 SEC) : INIT SCF
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 -1.638907e+05 0.000000e+00 3.854e-02 5.030e+01
GE2 -1.638911e+05 -3.479698e-01 1.518e-02 4.783e+01
GE3 -1.638408e+05 5.024635e+01 2.521e-02 4.777e+01
GE4 -1.638514e+05 -1.054895e+01 3.420e-02 4.782e+01
GE5 -1.638946e+05 -4.322259e+01 1.177e-02 4.780e+01
GE6 -1.639009e+05 -6.278126e+00 7.226e-03 4.817e+01
GE7 -1.639018e+05 -8.756690e-01 3.926e-03 4.862e+01
GE8 -1.639015e+05 2.604772e-01 3.232e-03 4.866e+01
GE9 -1.639015e+05 1.168321e-02 2.787e-03 4.859e+01
GE10 -1.639017e+05 -1.624482e-01 1.714e-03 4.862e+01
GE11 -1.639017e+05 -6.268882e-03 9.545e-04 4.864e+01
GE12 -1.639017e+05 1.979424e-03 6.277e-04 4.863e+01
GE13 -1.639017e+05 3.486882e-03 7.897e-04 4.865e+01
GE14 -1.639017e+05 -1.810689e-02 3.698e-04 4.851e+01
GE15 -1.639017e+05 6.875260e-03 6.230e-04 4.865e+01
GE16 -1.639017e+05 -1.330003e-02 2.934e-04 4.864e+01
GE17 -1.639017e+05 -2.972647e-03 9.218e-05 4.875e+01
GE18 -1.639017e+05 -2.896182e-04 5.271e-05 4.864e+01
GE19 -1.639017e+05 -8.463964e-05 1.884e-05 4.864e+01
GE20 -1.639017e+05 -1.345102e-05 1.312e-05 4.861e+01
GE21 -1.639017e+05 -6.146171e-06 9.098e-06 4.862e+01
GE22 -1.639017e+05 -2.287936e-06 2.651e-06 4.862e+01
GE23 -1.639017e+05 -1.524020e-07 2.780e-06 4.870e+01
GE24 -1.639017e+05 -2.510500e-07 1.213e-06 4.871e+01
GE25 -1.639017e+05 -2.378342e-08 9.860e-07 4.862e+01
GE26 -1.639017e+05 -1.908119e-08 9.168e-07 4.864e+01
GE27 -1.639017e+05 -3.506880e-08 3.263e-07 4.868e+01
GE28 -1.639017e+05 -3.093578e-09 1.505e-07 4.853e+01
GE29 -1.639017e+05 -1.979890e-10 1.772e-07 4.861e+01
GE30 -1.639017e+05 1.163185e-09 7.458e-08 5.606e+01
TIME STATISTICS
------------------------------------------------------------------------------------
CLASS_NAME NAME TIME(Sec) CALLS AVG(Sec) PER(%)
------------------------------------------------------------------------------------
total 1476.62 11 134.24 100.00
Driver reading 0.95 1 0.95 0.06
Input Init 0.94 1 0.94 0.06
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 1475.67 1 1475.67 99.94
UnitCell check_tau 0.00 1 0.00 0.00
PW_Basis_Sup setuptransform 0.03 1 0.03 0.00
PW_Basis_Sup distributeg 0.03 1 0.03 0.00
mymath heapsort 0.03 207 0.00 0.00
Symmetry analy_sys 0.00 1 0.00 0.00
PW_Basis_K setuptransform 0.06 1 0.06 0.00
PW_Basis_K distributeg 0.03 1 0.03 0.00
PW_Basis setup_struc_factor 0.53 1 0.53 0.04
ORB_control read_orb_first 0.06 1 0.06 0.00
LCAO_Orbitals Read_Orbitals 0.06 1 0.06 0.00
NOrbital_Lm extra_uniform 0.01 14 0.00 0.00
Mathzone_Add1 SplineD2 0.00 14 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.00 14 0.00 0.00
Sphbes Spherical_Bessel 0.07 12060 0.00 0.00
ppcell_vl init_vloc 2.32 1 2.32 0.16
Ions opt_ions 1471.51 1 1471.51 99.65
ESolver_KS_LCAO Run 1471.51 1 1471.51 99.65
ESolver_KS_LCAO beforescf 4.78 1 4.78 0.32
ESolver_KS_LCAO beforesolver 0.61 1 0.61 0.04
ESolver_KS_LCAO set_matrix_grid 0.10 1 0.10 0.01
atom_arrange search 0.01 1 0.01 0.00
Grid_Technique init 0.07 1 0.07 0.00
Grid_BigCell grid_expansion_index 0.01 2 0.00 0.00
Record_adj for_2d 0.02 1 0.02 0.00
Grid_Driver Find_atom 0.01 2760 0.00 0.00
LCAO_Hamilt grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.00 1 0.00 0.00
OverlapNew initialize_SR 0.00 1 0.00 0.00
EkineticNew initialize_HR 0.00 1 0.00 0.00
NonlocalNew initialize_HR 0.00 1 0.00 0.00
Charge set_rho_core 0.00 1 0.00 0.00
Charge atomic_rho 2.50 1 2.50 0.17
PW_Basis_Sup recip2real 4.34 188 0.02 0.29
PW_Basis_Sup gathers_scatterp 2.09 188 0.01 0.14
Potential init_pot 0.45 1 0.45 0.03
Potential update_from_charge 12.84 31 0.41 0.87
Potential cal_fixed_v 0.03 1 0.03 0.00
PotLocal cal_fixed_v 0.03 1 0.03 0.00
Potential cal_v_eff 12.78 31 0.41 0.87
H_Hartree_pw v_hartree 1.56 31 0.05 0.11
PW_Basis_Sup real2recip 4.12 217 0.02 0.28
PW_Basis_Sup gatherp_scatters 1.71 217 0.01 0.12
PotXC cal_v_eff 11.13 31 0.36 0.75
XC_Functional v_xc 11.08 31 0.36 0.75
Potential interpolate_vrs 0.03 31 0.00 0.00
Symmetry rhog_symmetry 31.49 31 1.02 2.13
Symmetry group fft grids 18.19 31 0.59 1.23
H_Ewald_pw compute_ewald 0.02 1 0.02 0.00
HSolverLCAO solve 1408.77 30 46.96 95.40
HamiltLCAO updateHk 43.34 1080 0.04 2.94
OperatorLCAO init 40.24 3240 0.01 2.73
Veff contributeHR 35.21 30 1.17 2.38
Gint_interface cal_gint 59.21 60 0.99 4.01
Gint_interface cal_gint_vlocal 30.30 30 1.01 2.05
Gint_Tools cal_psir_ylm 18.28 324000 0.00 1.24
Gint_k transfer_pvpR 4.91 30 0.16 0.33
OverlapNew calculate_SR 0.50 1 0.50 0.03
OverlapNew contributeHk 2.60 1080 0.00 0.18
EkineticNew contributeHR 0.50 30 0.02 0.03
EkineticNew calculate_HR 0.50 1 0.50 0.03
NonlocalNew contributeHR 0.64 30 0.02 0.04
NonlocalNew calculate_HR 0.58 1 0.58 0.04
OperatorLCAO contributeHk 2.74 1080 0.00 0.19
HSolverLCAO hamiltSolvePsiK 1204.84 1080 1.12 81.59
DiagoElpa elpa_solve 1202.76 1080 1.11 81.45
ElecStateLCAO psiToRho 160.58 30 5.35 10.88
elecstate cal_dm 92.64 30 3.09 6.27
psiMulPsiMpi pdgemm 91.50 1080 0.08 6.20
DensityMatrix cal_DMR 2.94 30 0.10 0.20
Local_Orbital_wfc wfc_2d_to_grid 38.34 1116 0.03 2.60
Gint transfer_DMR 0.76 30 0.03 0.05
Gint_interface cal_gint_rho 28.90 30 0.96 1.96
Charge_Mixing get_drho 0.03 30 0.00 0.00
Charge mix_rho 0.70 29 0.02 0.05
Charge Pulay_mixing 0.21 29 0.01 0.01
ModuleIO write_wfc_nao_complex 6.04 36 0.17 0.41
ModuleIO write_istate_info 0.06 1 0.06 0.00
ModuleIO nscf_band 0.20 1 0.20 0.01
------------------------------------------------------------------------------------
----------------------------------------------------------
START Time : Tue Nov 21 17:20:30 2023
FINISH Time : Tue Nov 21 17:45:06 2023
TOTAL Time : 1476
SEE INFORMATION IN : OUT.ABACUS/
Describe the bug
when running abacus jobs. I found some of my jobs were killed by SIGNAL 9 after many SCF steps.
The error message goes like this:
Therefore, I am wondering maybe there is a possibility of a memory leak issue.
Expected behavior
fig it out and solve it maybe.
To Reproduce
InSb.tar.gz
Environment
env and image:
registry.dp.tech/deepmodeling/abacus-intel:latest
machine:
"bohrium": { "scass_type": "c32_m128_cpu", "job_type": "container", "platform": "ali" },
command:
!/bin/bash
source /opt/intel/oneapi/setvars.sh export OMP_NUM_THREADS=1 cp ./scf/* ./ mpirun -np 16 abacus
Additional Context
No response
Task list for Issue attackers