deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
175 stars 136 forks source link

possibility of memory leak : job is killed by SIGNAL 9 after several scf steps. #2935

Closed QG-phy closed 1 year ago

QG-phy commented 1 year ago

Describe the bug

when running abacus jobs. I found some of my jobs were killed by SIGNAL 9 after many SCF steps.

The error message goes like this: image

Therefore, I am wondering maybe there is a possibility of a memory leak issue.

Expected behavior

fig it out and solve it maybe.

To Reproduce

InSb.tar.gz

Environment

env and image:

registry.dp.tech/deepmodeling/abacus-intel:latest

machine:

"bohrium": { "scass_type": "c32_m128_cpu", "job_type": "container", "platform": "ali" },

command:

!/bin/bash

source /opt/intel/oneapi/setvars.sh export OMP_NUM_THREADS=1 cp ./scf/* ./ mpirun -np 16 abacus

Additional Context

No response

Task list for Issue attackers

hongriTianqi commented 1 year ago

OK, we need to detect whether the memory increase is normal or abnormal

hongriTianqi commented 1 year ago

The gnu machine is ok with the same type of machine. ABACUS v3.2

           Atomic-orbital Based Ab-initio Computation at UStc                    

                 Website: http://abacus.ustc.edu.cn/                             
           Documentation: https://abacus.deepmodeling.com/                       
              Repository: https://github.com/abacusmodeling/abacus-develop       
                          https://github.com/deepmodeling/abacus-develop         

Tue Oct 31 12:41:43 2023 MAKE THE DIR : OUT.ABACUS/

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Warning: the number of valence electrons in pseudopotential > 3 for In: [Kr] 4d10 5s2 5p1 Warning: the number of valence electrons in pseudopotential > 5 for Sb: [Kr] 4d10 5s2 5p3 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient. If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

UNIFORM GRID DIM : 180 200 180 UNIFORM GRID DIM(BIG): 45 40 45 DONE(1.39257 SEC) : SETUP UNITCELL DONE(1.51377 SEC) : SYMMETRY DONE(1.67186 SEC) : INIT K-POINTS

Self-consistent calculations for electrons

SPIN KPOINTS PROCESSORS NBASE
1 36 16 2400

Use Systematically Improvable Atomic bases

ELEMENT ORBITALS NBASE NATOM XC
In 2s2p2d1f-7au 25 48
Sb 2s2p2d1f-7au 25 48

Initial plane wave basis and FFT box


SELF-CONSISTENT :

START CHARGE : atomic DONE(30.473 SEC) : INIT SCF ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 -1.639180e+05 0.000000e+00 3.989e-02 1.705e+02
GE2 -1.639183e+05 -3.836051e-01 1.641e-02 1.660e+02
GE3 -1.638702e+05 4.810953e+01 2.314e-02 1.662e+02
GE4 -1.638471e+05 2.315318e+01 4.421e-02 1.674e+02
GE5 -1.639208e+05 -7.370479e+01 1.282e-02 1.667e+02
GE6 -1.639277e+05 -6.927459e+00 8.503e-03 1.662e+02
GE7 -1.639287e+05 -9.728276e-01 4.888e-03 1.671e+02
GE8 -1.639284e+05 2.916989e-01 3.985e-03 1.669e+02
GE9 -1.639284e+05 2.072873e-02 3.086e-03 1.671e+02
GE10 -1.639285e+05 -9.234770e-02 2.066e-03 1.677e+02
GE11 -1.639284e+05 4.144906e-02 1.235e-03 1.674e+02
GE12 -1.639284e+05 -2.950202e-02 7.407e-04 1.667e+02
GE13 -1.639284e+05 8.972636e-03 8.118e-04 1.666e+02
GE14 -1.639285e+05 -1.773380e-02 4.452e-04 1.667e+02
GE15 -1.639284e+05 7.928516e-03 7.306e-04 1.672e+02
GE16 -1.639285e+05 -1.785576e-02 3.037e-04 1.669e+02
GE17 -1.639285e+05 -2.880893e-03 1.031e-04 1.665e+02
GE18 -1.639285e+05 -3.737139e-04 5.887e-05 1.667e+02
GE19 -1.639285e+05 -1.059564e-04 2.553e-05 1.664e+02
GE20 -1.639285e+05 -2.129738e-05 1.545e-05 1.665e+02
GE21 -1.639285e+05 -7.530288e-06 1.370e-05 1.665e+02
GE22 -1.639285e+05 -5.803601e-06 3.086e-06 1.667e+02
GE23 -1.639285e+05 5.566955e-07 5.313e-06 1.663e+02
GE24 -1.639285e+05 -1.096116e-06 1.022e-06 1.664e+02
GE25 -1.639285e+05 -3.274242e-08 1.269e-06 1.665e+02
GE26 -1.639285e+05 4.521573e-08 8.713e-07 1.668e+02
GE27 -1.639285e+05 -7.758693e-08 6.493e-07 1.664e+02
GE28 -1.639285e+05 -1.116163e-08 2.068e-07 1.665e+02
GE29 -1.639285e+05 2.029387e-09 2.055e-07 1.669e+02
GE30 -1.639285e+05 5.989166e-09 8.294e-08 2.711e+02

START Time : Tue Oct 31 12:41:43 2023 FINISH Time : Tue Oct 31 14:07:28 2023 TOTAL Time : 5145 SEE INFORMATION IN : OUT.ABACUS/

hongriTianqi commented 1 year ago

This case is OK with the command of OMP_NUM_THREADS=16 mpirun -np 1 abacus: GE17 -1.639285e+05 -2.880899e-03 1.031e-04 6.460e+01 GE18 -1.639285e+05 -3.737170e-04 5.887e-05 6.495e+01 GE19 -1.639285e+05 -1.059542e-04 2.553e-05 6.468e+01 GE20 -1.639285e+05 -2.129832e-05 1.545e-05 6.459e+01 GE21 -1.639285e+05 -7.527837e-06 1.370e-05 6.459e+01 GE22 -1.639285e+05 -5.803205e-06 3.086e-06 6.465e+01 GE23 -1.639285e+05 5.520180e-07 5.313e-06 6.484e+01 GE24 -1.639285e+05 -1.096834e-06 1.022e-06 6.489e+01 GE25 -1.639285e+05 -3.348488e-08 1.269e-06 6.491e+01 GE26 -1.639285e+05 4.462176e-08 8.713e-07 6.488e+01 GE27 -1.639285e+05 -7.974006e-08 6.493e-07 6.483e+01 GE28 -1.639285e+05 7.993804e-09 2.068e-07 6.481e+01 GE29 -1.639285e+05 -1.017168e-08 2.055e-07 6.525e+01 GE30 -1.639285e+05 1.056766e-08 8.294e-08 1.408e+02 START Time : Tue Oct 31 15:44:31 2023 FINISH Time : Tue Oct 31 16:18:22 2023

caic99 commented 1 year ago

Hi @hongriTianqi , No memory leak at the end means all allocated is freed finally. However, this does not mean that the memory required for a single scf step is freed after this step. If that happenes, all memory is freed by the end, rather than at the end of each scf step. I would suggest monitoring if the memory growth as the scf step accumulates. Before that, please make sure this signal is sent by OOM killer.

hongriTianqi commented 1 year ago

@caic99 Thx, you are right.

hongriTianqi commented 1 year ago

The memory exceeded 124 G at the 16 scf step as I tested on an intel machine with the command OMP_NUM_THREADS=1 mpirun -np 16 abacus. 124 G is the maximum memory on that machine as indicated by the htop command.

hongriTianqi commented 1 year ago

This is not a HSE example, it is a PBE example.

caic99 commented 1 year ago

Here is the latest address sanitizer report on memory leak.

LiuXiaohui123321 commented 1 year ago

@hongriTianqi Hi! I test the example above, and use Grafana to give a graph of the amount of memory used over time, 计算过程内存使用监控图

The memory used during SCF iterations is increasing. scf自洽迭代图

hongriTianqi commented 1 year ago

The debug info goes as follows:

==> DiagoElpa::diag    17.488 GB       24.3468 s
 ==> HamiltLCAO::updateHk       17.2276 GB      26.3595 s
 ==> OperatorLCAO::init 17.2276 GB      26.3595 s
 ==> OperatorLCAO::init 17.2276 GB      26.3616 s
 ==> OverlapNew::contributeHk   17.2276 GB      26.3616 s
 ==> OperatorLCAO::init 17.2276 GB      26.3656 s
 ==> OperatorLCAO::contributeHk 17.2276 GB      26.3656 s
 ==> HSolverLCAO::hamiltSolvePsiK       17.2276 GB      26.3704 s
 ==> DiagoElpa::diag    17.2276 GB      26.3704 s
 ==> HamiltLCAO::updateHk       17.0351 GB      28.3936 s
 ==> OperatorLCAO::init 17.0351 GB      28.3937 s
 ==> OperatorLCAO::init 17.0351 GB      28.3958 s
 ==> OverlapNew::contributeHk   17.0351 GB      28.3958 s
 ==> OperatorLCAO::init 17.0351 GB      28.3999 s
 ==> OperatorLCAO::contributeHk 17.0351 GB      28.3999 s
 ==> HSolverLCAO::hamiltSolvePsiK       17.0351 GB      28.4047 s
 ==> DiagoElpa::diag    17.0351 GB      28.4047 s
 ==> HamiltLCAO::updateHk       16.8386 GB      30.3966 s
 ==> OperatorLCAO::init 16.8386 GB      30.3966 s
 ==> OperatorLCAO::init 16.8386 GB      30.3988 s
 ==> OverlapNew::contributeHk   16.8386 GB      30.3988 s
 ==> OperatorLCAO::init 16.8386 GB      30.4027 s
 ==> OperatorLCAO::contributeHk 16.8386 GB      30.4027 s
 ==> HSolverLCAO::hamiltSolvePsiK       16.8386 GB      30.4076 s
 ==> DiagoElpa::diag    16.8386 GB      30.4076 s
 ==> HamiltLCAO::updateHk       16.6527 GB      32.3951 s
 ==> OperatorLCAO::init 16.6527 GB      32.3952 s
 ==> OperatorLCAO::init 16.6527 GB      32.3973 s
 ==> OverlapNew::contributeHk   16.6527 GB      32.3973 s
 ==> OperatorLCAO::init 16.6527 GB      32.4012 s
 ==> OperatorLCAO::contributeHk 16.6527 GB      32.4012 s
 ==> HSolverLCAO::hamiltSolvePsiK       16.6527 GB      32.4061 s
 ==> DiagoElpa::diag    16.6527 GB      32.4061 s
 ==> HamiltLCAO::updateHk       16.4619 GB      34.3882 s
 ==> OperatorLCAO::init 16.4619 GB      34.3883 s
 ==> OperatorLCAO::init 16.4619 GB      34.3902 s
 ==> OverlapNew::contributeHk   16.4619 GB      34.3902 s
 ==> OperatorLCAO::init 16.4619 GB      34.3941 s
 ==> OperatorLCAO::contributeHk 16.4619 GB      34.3941 s
 ==> HSolverLCAO::hamiltSolvePsiK       16.4616 GB      34.3989 s
 ==> DiagoElpa::diag    16.4616 GB      34.3989 s
 ==> HamiltLCAO::updateHk       16.2734 GB      36.3922 s
 ==> OperatorLCAO::init 16.2734 GB      36.3923 s
 ==> OperatorLCAO::init 16.2734 GB      36.3945 s
 ==> OverlapNew::contributeHk   16.2734 GB      36.3945 s
 ==> OperatorLCAO::init 16.2734 GB      36.3986 s
 ==> OperatorLCAO::contributeHk 16.2734 GB      36.3987 s
hongriTianqi commented 1 year ago

It is clear that the memory was consumed in every step of DiagoElpa::diag. So now it is clear that this problem is caused by elpa interface. Scalapack does not cause the problem.

hongriTianqi commented 1 year ago

To update, the function es.generalized_eigenvector inside DiagoElpa::diag is consuming memory.

mm
hongriTianqi commented 1 year ago
t1
hongriTianqi commented 1 year ago
t2
hongriTianqi commented 1 year ago
t3
hongriTianqi commented 1 year ago
t4
hongriTianqi commented 1 year ago

Look inside: decomposeRightMatrix: 0.06 GB (line 85) Cpzgemm : 0.1 GB (line 117) eigenvector: 0.05 GB (line 188)

hongriTianqi commented 1 year ago

inside decomposeRightMatrix, elpa_cholesky is consuming memory

t5
hongriTianqi commented 1 year ago

some indices consumming memory?

t6
hongriTianqi commented 1 year ago

elpa_invert_triangular consumming memory

t7
hongriTianqi commented 1 year ago

elpa_eigenvector is consuming memory

t8
hongriTianqi commented 1 year ago

We tried the newest elpa version of 2023.05.001, the same problem exits:

ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::82        17.1027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::90        17.1027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::233       17.1027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::286       17.1027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::294       17.1027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::297       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::303       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::306       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::339       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::346       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::361       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::369       17.0794 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::372       17.077 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::96        17.077 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::102       17.077 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::113       17.077 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::123       17.077 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::130       16.9772 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::139       16.9772 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::142       16.9759 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::193       16.9759 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::63        16.9759 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::66        16.625 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::197       16.625 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::208       16.625 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::217       16.625 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::82        16.6027 GB
 ==> /root/abacus-develop/source/module_hsolver/genelpa/elpa_new_complex.cpp::90        16.6027 GB
caic99 commented 1 year ago

I analyzed the memory usage in a short period of time with Intel VTune.

image

The result shows that:

For the ELPA part, I would suggest trying out supported kernels and algorithms, and (hopefully) find a working one. As for MKL, I recommend double-checking the interface provided in my_math.hpp and its calling pattern to be correct.

Edit: To reproduce, run ABACUS with MPI first, use htop to find the PID of a random rank, and analyze info with: vtune -collect memory-consumption -target-pid 7890. After minutes of data collecting, it is OK to manually interrupt the vtune process and stop the ABACUS program. Use vtune-backend to visualize the result.

hongriTianqi commented 1 year ago

@QG-phy The calculation is successful with that latest intel image (abacus-intel:latest), please see the log file below:


                              ABACUS v3.4.3

               Atomic-orbital Based Ab-initio Computation at UStc                    

                     Website: http://abacus.ustc.edu.cn/                             
               Documentation: https://abacus.deepmodeling.com/                       
                  Repository: https://github.com/abacusmodeling/abacus-develop       
                              https://github.com/deepmodeling/abacus-develop         
                      Commit: baccbe3 (Fri Nov 17 22:22:05 2023 +0800)

 Tue Nov 21 17:20:30 2023
 MAKE THE DIR         : OUT.ABACUS/
 RUNNING WITH DEVICE  : CPU / Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Warning: the number of valence electrons in pseudopotential > 3 for In: [Kr] 4d10 5s2 5p1
 Warning: the number of valence electrons in pseudopotential > 5 for Sb: [Kr] 4d10 5s2 5p3
 Pseudopotentials with additional electrons can yield (more) accurate outcomes, but may be less efficient.
 If you're confident that your chosen pseudopotential is appropriate, you can safely ignore this warning.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 UNIFORM GRID DIM        : 180 * 200 * 180
 UNIFORM GRID DIM(BIG)   : 45 * 40 * 45
 DONE(1.09052    SEC) : SETUP UNITCELL
 DONE(1.17741    SEC) : SYMMETRY
 DONE(1.27431    SEC) : INIT K-POINTS
 ---------------------------------------------------------
 Self-consistent calculations for electrons
 ---------------------------------------------------------
 SPIN    KPOINTS         PROCESSORS  NBASE       
 1       36              16          2400        
 ---------------------------------------------------------
 Use Systematically Improvable Atomic bases
 ---------------------------------------------------------
 ELEMENT ORBITALS        NBASE       NATOM       XC          
 In      2s2p2d1f-7au    25          48          
 Sb      2s2p2d1f-7au    25          48          
 ---------------------------------------------------------
 Initial plane wave basis and FFT box
 ---------------------------------------------------------
 DONE(1.38615    SEC) : INIT PLANEWAVE
 -------------------------------------------
 SELF-CONSISTENT : 
 -------------------------------------------
 START CHARGE      : atomic
 DONE(9.4308     SEC) : INIT SCF
 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)    
 GE1    -1.638907e+05  0.000000e+00   3.854e-02  5.030e+01  
 GE2    -1.638911e+05  -3.479698e-01  1.518e-02  4.783e+01  
 GE3    -1.638408e+05  5.024635e+01   2.521e-02  4.777e+01  
 GE4    -1.638514e+05  -1.054895e+01  3.420e-02  4.782e+01  
 GE5    -1.638946e+05  -4.322259e+01  1.177e-02  4.780e+01  
 GE6    -1.639009e+05  -6.278126e+00  7.226e-03  4.817e+01  
 GE7    -1.639018e+05  -8.756690e-01  3.926e-03  4.862e+01  
 GE8    -1.639015e+05  2.604772e-01   3.232e-03  4.866e+01  
 GE9    -1.639015e+05  1.168321e-02   2.787e-03  4.859e+01  
 GE10   -1.639017e+05  -1.624482e-01  1.714e-03  4.862e+01  
 GE11   -1.639017e+05  -6.268882e-03  9.545e-04  4.864e+01  
 GE12   -1.639017e+05  1.979424e-03   6.277e-04  4.863e+01  
 GE13   -1.639017e+05  3.486882e-03   7.897e-04  4.865e+01  
 GE14   -1.639017e+05  -1.810689e-02  3.698e-04  4.851e+01  
 GE15   -1.639017e+05  6.875260e-03   6.230e-04  4.865e+01  
 GE16   -1.639017e+05  -1.330003e-02  2.934e-04  4.864e+01  
 GE17   -1.639017e+05  -2.972647e-03  9.218e-05  4.875e+01  
 GE18   -1.639017e+05  -2.896182e-04  5.271e-05  4.864e+01  
 GE19   -1.639017e+05  -8.463964e-05  1.884e-05  4.864e+01  
 GE20   -1.639017e+05  -1.345102e-05  1.312e-05  4.861e+01  
 GE21   -1.639017e+05  -6.146171e-06  9.098e-06  4.862e+01  
 GE22   -1.639017e+05  -2.287936e-06  2.651e-06  4.862e+01  
 GE23   -1.639017e+05  -1.524020e-07  2.780e-06  4.870e+01  
 GE24   -1.639017e+05  -2.510500e-07  1.213e-06  4.871e+01  
 GE25   -1.639017e+05  -2.378342e-08  9.860e-07  4.862e+01  
 GE26   -1.639017e+05  -1.908119e-08  9.168e-07  4.864e+01  
 GE27   -1.639017e+05  -3.506880e-08  3.263e-07  4.868e+01  
 GE28   -1.639017e+05  -3.093578e-09  1.505e-07  4.853e+01  
 GE29   -1.639017e+05  -1.979890e-10  1.772e-07  4.861e+01  
 GE30   -1.639017e+05  1.163185e-09   7.458e-08  5.606e+01  
TIME STATISTICS
------------------------------------------------------------------------------------
     CLASS_NAME                 NAME            TIME(Sec)  CALLS   AVG(Sec) PER(%)
------------------------------------------------------------------------------------
                     total                      1476.62         11 134.24   100.00
Driver               reading                      0.95           1   0.95     0.06
Input                Init                         0.94           1   0.94     0.06
Input_Conv           Convert                      0.00           1   0.00     0.00
Driver               driver_line                1475.67          1 1475.67   99.94
UnitCell             check_tau                    0.00           1   0.00     0.00
PW_Basis_Sup         setuptransform               0.03           1   0.03     0.00
PW_Basis_Sup         distributeg                  0.03           1   0.03     0.00
mymath               heapsort                     0.03         207   0.00     0.00
Symmetry             analy_sys                    0.00           1   0.00     0.00
PW_Basis_K           setuptransform               0.06           1   0.06     0.00
PW_Basis_K           distributeg                  0.03           1   0.03     0.00
PW_Basis             setup_struc_factor           0.53           1   0.53     0.04
ORB_control          read_orb_first               0.06           1   0.06     0.00
LCAO_Orbitals        Read_Orbitals                0.06           1   0.06     0.00
NOrbital_Lm          extra_uniform                0.01          14   0.00     0.00
Mathzone_Add1        SplineD2                     0.00          14   0.00     0.00
Mathzone_Add1        Cubic_Spline_Interpolation   0.00          14   0.00     0.00
Sphbes               Spherical_Bessel             0.07       12060   0.00     0.00
ppcell_vl            init_vloc                    2.32           1   2.32     0.16
Ions                 opt_ions                   1471.51          1 1471.51   99.65
ESolver_KS_LCAO      Run                        1471.51          1 1471.51   99.65
ESolver_KS_LCAO      beforescf                    4.78           1   4.78     0.32
ESolver_KS_LCAO      beforesolver                 0.61           1   0.61     0.04
ESolver_KS_LCAO      set_matrix_grid              0.10           1   0.10     0.01
atom_arrange         search                       0.01           1   0.01     0.00
Grid_Technique       init                         0.07           1   0.07     0.00
Grid_BigCell         grid_expansion_index         0.01           2   0.00     0.00
Record_adj           for_2d                       0.02           1   0.02     0.00
Grid_Driver          Find_atom                    0.01        2760   0.00     0.00
LCAO_Hamilt          grid_prepare                 0.00           1   0.00     0.00
Veff                 initialize_HR                0.00           1   0.00     0.00
OverlapNew           initialize_SR                0.00           1   0.00     0.00
EkineticNew          initialize_HR                0.00           1   0.00     0.00
NonlocalNew          initialize_HR                0.00           1   0.00     0.00
Charge               set_rho_core                 0.00           1   0.00     0.00
Charge               atomic_rho                   2.50           1   2.50     0.17
PW_Basis_Sup         recip2real                   4.34         188   0.02     0.29
PW_Basis_Sup         gathers_scatterp             2.09         188   0.01     0.14
Potential            init_pot                     0.45           1   0.45     0.03
Potential            update_from_charge          12.84          31   0.41     0.87
Potential            cal_fixed_v                  0.03           1   0.03     0.00
PotLocal             cal_fixed_v                  0.03           1   0.03     0.00
Potential            cal_v_eff                   12.78          31   0.41     0.87
H_Hartree_pw         v_hartree                    1.56          31   0.05     0.11
PW_Basis_Sup         real2recip                   4.12         217   0.02     0.28
PW_Basis_Sup         gatherp_scatters             1.71         217   0.01     0.12
PotXC                cal_v_eff                   11.13          31   0.36     0.75
XC_Functional        v_xc                        11.08          31   0.36     0.75
Potential            interpolate_vrs              0.03          31   0.00     0.00
Symmetry             rhog_symmetry               31.49          31   1.02     2.13
Symmetry             group fft grids             18.19          31   0.59     1.23
H_Ewald_pw           compute_ewald                0.02           1   0.02     0.00
HSolverLCAO          solve                      1408.77         30  46.96    95.40
HamiltLCAO           updateHk                    43.34        1080   0.04     2.94
OperatorLCAO         init                        40.24        3240   0.01     2.73
Veff                 contributeHR                35.21          30   1.17     2.38
Gint_interface       cal_gint                    59.21          60   0.99     4.01
Gint_interface       cal_gint_vlocal             30.30          30   1.01     2.05
Gint_Tools           cal_psir_ylm                18.28      324000   0.00     1.24
Gint_k               transfer_pvpR                4.91          30   0.16     0.33
OverlapNew           calculate_SR                 0.50           1   0.50     0.03
OverlapNew           contributeHk                 2.60        1080   0.00     0.18
EkineticNew          contributeHR                 0.50          30   0.02     0.03
EkineticNew          calculate_HR                 0.50           1   0.50     0.03
NonlocalNew          contributeHR                 0.64          30   0.02     0.04
NonlocalNew          calculate_HR                 0.58           1   0.58     0.04
OperatorLCAO         contributeHk                 2.74        1080   0.00     0.19
HSolverLCAO          hamiltSolvePsiK            1204.84       1080   1.12    81.59
DiagoElpa            elpa_solve                 1202.76       1080   1.11    81.45
ElecStateLCAO        psiToRho                   160.58          30   5.35    10.88
elecstate            cal_dm                      92.64          30   3.09     6.27
psiMulPsiMpi         pdgemm                      91.50        1080   0.08     6.20
DensityMatrix        cal_DMR                      2.94          30   0.10     0.20
 Local_Orbital_wfc   wfc_2d_to_grid              38.34        1116   0.03     2.60
Gint                 transfer_DMR                 0.76          30   0.03     0.05
Gint_interface       cal_gint_rho                28.90          30   0.96     1.96
Charge_Mixing        get_drho                     0.03          30   0.00     0.00
Charge               mix_rho                      0.70          29   0.02     0.05
Charge               Pulay_mixing                 0.21          29   0.01     0.01
ModuleIO             write_wfc_nao_complex        6.04          36   0.17     0.41
ModuleIO             write_istate_info            0.06           1   0.06     0.00
ModuleIO             nscf_band                    0.20           1   0.20     0.01
------------------------------------------------------------------------------------

 ----------------------------------------------------------

 START  Time  : Tue Nov 21 17:20:30 2023
 FINISH Time  : Tue Nov 21 17:45:06 2023
 TOTAL  Time  : 1476
 SEE INFORMATION IN : OUT.ABACUS/