deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

the relax calculation cannot manage the memory very well #5441

Open gradencapaldi opened 2 weeks ago

gradencapaldi commented 2 weeks ago

Describe the bug

During geometric optimization in the form of relax calculation, the HPC process will always be killed with the warning of "KILLED BY SIGNAL 9". The reason behind such phenomena is that: The memory usage of the calculation result exceeds the amount of memory allocated by the system. For a mature DFT software as abacus plans to be, the memory management should be designed to remove the useless content and recyle the storage space in a certain operation period. The abacus version used was V 3.8.1, and the server configuration information was screenshot as below. Image Image Image

Expected behavior

  1. Confirm the reason of KILLED BY SIGNAL 9;
  2. Give the solution to deal with such problem;

To Reproduce

the INPUT, STRU, KPT files were attached below.

INPUT: INPUT_PARAMETERS

General

suffix Li8C96_relax ntype 2 nelec 0.0 nspin 1 nbands 245 # in terms of nelec and nspin basis_type lcao

gamma_only 1

pseudo_dir /home/mks/yzt/abacus/SG15_ONCV_v1_upf orbital_dir /home/mks/yzt/abacus/SG15-Version1p0__StandardOrbitals-Version2p0

SCF

ecutwfc 100 # Rydberg scf_thr 1e-4 # Rydberg scf_nmax 20

Smearing

smearing_method gauss smearing_sigma 0.01

Relaxition

calculation relax # relax or cell-relax relax_nmax 50 force_thr_ev 1.0e-2 # eV stress_thr 2 # kBar

Output

out_stru 1

STRU: ATOMIC_SPECIES C 12.011 C_ONCV_PBE-1.0.upf Li 6.94 Li_ONCV_PBE-1.0.upf

NUMERICAL_ORBITAL C_gga_7au_100Ry_2s2p1d.orb Li_gga_7au_100Ry_4s1p.orb

LATTICE_CONSTANT 1.8897261258369282

LATTICE_VECTORS 9.8691650000 0.0000000000 0.0000000000
-3.700937000 6.4102110000 0.0000000000
0.0000000000 0.0000000000 15.606146000

ATOMIC_POSITIONS Direct

C 0.0000000000 96 0.0000000000 0.0000000000 0.1250000000 1 1 1 mag 0.0 0.0000000000 0.0000000000 0.6250000000 1 1 1 mag 0.0 0.0000000000 0.3333330000 0.1250000000 1 1 1 mag 0.0 0.0000000000 0.3333330000 0.6250000000 1 1 1 mag 0.0 0.0000000000 0.6666670000 0.1250000000 1 1 1 mag 0.0 0.0000000000 0.6666670000 0.6250000000 1 1 1 mag 0.0 0.2500000000 0.0000000000 0.1250000000 1 1 1 mag 0.0 0.2500000000 0.0000000000 0.6250000000 1 1 1 mag 0.0 0.2500000000 0.3333330000 0.1250000000 1 1 1 mag 0.0 0.2500000000 0.3333330000 0.6250000000 1 1 1 mag 0.0 0.2500000000 0.6666670000 0.1250000000 1 1 1 mag 0.0 0.2500000000 0.6666670000 0.6250000000 1 1 1 mag 0.0 0.5000000000 0.0000000000 0.1250000000 1 1 1 mag 0.0 0.5000000000 0.0000000000 0.6250000000 1 1 1 mag 0.0 0.5000000000 0.3333330000 0.1250000000 1 1 1 mag 0.0 0.5000000000 0.3333330000 0.6250000000 1 1 1 mag 0.0 0.5000000000 0.6666670000 0.1250000000 1 1 1 mag 0.0 0.5000000000 0.6666670000 0.6250000000 1 1 1 mag 0.0 0.7500000000 0.0000000000 0.1250000000 1 1 1 mag 0.0 0.7500000000 0.0000000000 0.6250000000 1 1 1 mag 0.0 0.7500000000 0.3333330000 0.1250000000 1 1 1 mag 0.0 0.7500000000 0.3333330000 0.6250000000 1 1 1 mag 0.0 0.7500000000 0.6666670000 0.1250000000 1 1 1 mag 0.0 0.7500000000 0.6666670000 0.6250000000 1 1 1 mag 0.0 0.0000000000 0.0000000000 0.3750000000 1 1 1 mag 0.0 0.0000000000 0.0000000000 0.8750000000 1 1 1 mag 0.0 0.0000000000 0.3333330000 0.3750000000 1 1 1 mag 0.0 0.0000000000 0.3333330000 0.8750000000 1 1 1 mag 0.0 0.0000000000 0.6666670000 0.3750000000 1 1 1 mag 0.0 0.0000000000 0.6666670000 0.8750000000 1 1 1 mag 0.0 0.2500000000 0.0000000000 0.3750000000 1 1 1 mag 0.0 0.2500000000 0.0000000000 0.8750000000 1 1 1 mag 0.0 0.2500000000 0.3333330000 0.3750000000 1 1 1 mag 0.0 0.2500000000 0.3333330000 0.8750000000 1 1 1 mag 0.0 0.2500000000 0.6666670000 0.3750000000 1 1 1 mag 0.0 0.2500000000 0.6666670000 0.8750000000 1 1 1 mag 0.0 0.5000000000 0.0000000000 0.3750000000 1 1 1 mag 0.0 0.5000000000 0.0000000000 0.8750000000 1 1 1 mag 0.0 0.5000000000 0.3333330000 0.3750000000 1 1 1 mag 0.0 0.5000000000 0.3333330000 0.8750000000 1 1 1 mag 0.0 0.5000000000 0.6666670000 0.3750000000 1 1 1 mag 0.0 0.5000000000 0.6666670000 0.8750000000 1 1 1 mag 0.0 0.7500000000 0.0000000000 0.3750000000 1 1 1 mag 0.0 0.7500000000 0.0000000000 0.8750000000 1 1 1 mag 0.0 0.7500000000 0.3333330000 0.3750000000 1 1 1 mag 0.0 0.7500000000 0.3333330000 0.8750000000 1 1 1 mag 0.0 0.7500000000 0.6666670000 0.3750000000 1 1 1 mag 0.0 0.7500000000 0.6666670000 0.8750000000 1 1 1 mag 0.0 0.0833330000 0.2222220000 0.1250000000 1 1 1 mag 0.0 0.0833330000 0.2222220000 0.6250000000 1 1 1 mag 0.0 0.0833330000 0.5555560000 0.1250000000 1 1 1 mag 0.0 0.0833330000 0.5555560000 0.6250000000 1 1 1 mag 0.0 0.0833330000 0.8888890000 0.1250000000 1 1 1 mag 0.0 0.0833330000 0.8888890000 0.6250000000 1 1 1 mag 0.0 0.3333330000 0.2222220000 0.1250000000 1 1 1 mag 0.0 0.3333330000 0.2222220000 0.6250000000 1 1 1 mag 0.0 0.3333330000 0.5555560000 0.1250000000 1 1 1 mag 0.0 0.3333330000 0.5555560000 0.6250000000 1 1 1 mag 0.0 0.3333330000 0.8888890000 0.1250000000 1 1 1 mag 0.0 0.3333330000 0.8888890000 0.6250000000 1 1 1 mag 0.0 0.5833330000 0.2222220000 0.1250000000 1 1 1 mag 0.0 0.5833330000 0.2222220000 0.6250000000 1 1 1 mag 0.0 0.5833330000 0.5555560000 0.1250000000 1 1 1 mag 0.0 0.5833330000 0.5555560000 0.6250000000 1 1 1 mag 0.0 0.5833330000 0.8888890000 0.1250000000 1 1 1 mag 0.0 0.5833330000 0.8888890000 0.6250000000 1 1 1 mag 0.0 0.8333330000 0.2222220000 0.1250000000 1 1 1 mag 0.0 0.8333330000 0.2222220000 0.6250000000 1 1 1 mag 0.0 0.8333330000 0.5555560000 0.1250000000 1 1 1 mag 0.0 0.8333330000 0.5555560000 0.6250000000 1 1 1 mag 0.0 0.8333330000 0.8888890000 0.1250000000 1 1 1 mag 0.0 0.8333330000 0.8888890000 0.6250000000 1 1 1 mag 0.0 0.1666670000 0.1111110000 0.3750000000 1 1 1 mag 0.0 0.1666670000 0.1111110000 0.8750000000 1 1 1 mag 0.0 0.1666670000 0.4444440000 0.3750000000 1 1 1 mag 0.0 0.1666670000 0.4444440000 0.8750000000 1 1 1 mag 0.0 0.1666670000 0.7777780000 0.3750000000 1 1 1 mag 0.0 0.1666670000 0.7777780000 0.8750000000 1 1 1 mag 0.0 0.4166670000 0.1111110000 0.3750000000 1 1 1 mag 0.0 0.4166670000 0.1111110000 0.8750000000 1 1 1 mag 0.0 0.4166670000 0.4444440000 0.3750000000 1 1 1 mag 0.0 0.4166670000 0.4444440000 0.8750000000 1 1 1 mag 0.0 0.4166670000 0.7777780000 0.3750000000 1 1 1 mag 0.0 0.4166670000 0.7777780000 0.8750000000 1 1 1 mag 0.0 0.6666670000 0.1111110000 0.3750000000 1 1 1 mag 0.0 0.6666670000 0.1111110000 0.8750000000 1 1 1 mag 0.0 0.6666670000 0.4444440000 0.3750000000 1 1 1 mag 0.0 0.6666670000 0.4444440000 0.8750000000 1 1 1 mag 0.0 0.6666670000 0.7777780000 0.3750000000 1 1 1 mag 0.0 0.6666670000 0.7777780000 0.8750000000 1 1 1 mag 0.0 0.9166670000 0.1111110000 0.3750000000 1 1 1 mag 0.0 0.9166670000 0.1111110000 0.8750000000 1 1 1 mag 0.0 0.9166670000 0.4444440000 0.3750000000 1 1 1 mag 0.0 0.9166670000 0.4444440000 0.8750000000 1 1 1 mag 0.0 0.9166670000 0.7777780000 0.3750000000 1 1 1 mag 0.0 0.9166670000 0.7777780000 0.8750000000 1 1 1 mag 0.0

Li 0.0000000000 8 0.1000000000 0.3300000000 0.2500000000 1 1 1 mag 0.0 0.1000000000 0.6600000000 0.2500000000 1 1 1 mag 0.0 0.1000000000 0.3300000000 0.5000000000 1 1 1 mag 0.0 0.1000000000 0.6600000000 0.5000000000 1 1 1 mag 0.0 0.1000000000 0.6600000000 0.7500000000 1 1 1 mag 0.0 0.1000000000 0.3300000000 0.7500000000 1 1 1 mag 0.0 0.4000000000 0.6600000000 0.7500000000 1 1 1 mag 0.0 0.4000000000 0.6600000000 0.5000000000 1 1 1 mag 0.0

KPT: K_POINTS 0 Gamma 4 4 4 0 0 0

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)