abacusmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
GNU Lesser General Public License v3.0
156 stars 165 forks source link

unknown error in the calculation of large system #315

Open JTaozhang opened 7 months ago

JTaozhang commented 7 months ago

Describe the bug

Hi there, I am using abacus to calculate band structure of a large system(504 atoms). I have assigned 12 tasks (12 nodes, each node has 192GB memory) and 56 threads (each node has 56 cores) for each task for the job. However I am stucked by an error. the error like this

Error file:

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

log file ELEMENT ORBITALS NBASE NATOM XC
W 4s2p2d2f1g-8au 43 168
Te 2s2p2d1f-7au 25 336

Initial plane wave basis and FFT box

DONE(2.25912 SEC) : INIT PLANEWAVE

NONSELF-CONSISTENT :

START CHARGE : file

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 4 PID 28525 RUNNING AT node136 = KILLED BY SIGNAL: 6 (Aborted)

the details of inputting files are listed as follows, INPUT file: INPUT_PARAMETERS suffix WTe2 ntype 2 nelec 0.0 lspinorb 1 pseudo_dir /share/home/zhangtao/work/WTe2/abacus/pseudo orbital_dir /share/home/zhangtao/work/WTe2/abacus/orbital ecutwfc 100 #unit Ryberg 13.606 eV scf_thr 1e-6 #unit Ryberg 13.606 eV basis_type lcao calculation nscf

parameters(vdw)

vdw_method d2

Parameters (File)

init_chg file out_band 1 out_dos 1

KPT: K_POINTS
3 Line 0.0000000000 0.0000000000 0.0000000000 5 # G 0.5000000000 0.0000000000 0.0000000000 5 # X
0.5000000000 0.5000000000 0.0000000000 1 # S

part of information of STRU: ATOMIC_SPECIES W 183.841 W_ONCV_PBE_FR-1.0.upf Te 127.603 Te_ONCV_PBE_FR-1.1.upf

NUMERICAL_ORBITAL W_gga_8au_100Ry_4s2p2d2f1g.orb Te_gga_7au_100Ry_2s2p2d1f.orb

LATTICE_CONSTANT 1.889726

LATTICE_VECTORS 32.21631806204 0.000000000000 0.000000000000 0.000000000000 28.28261899040 0.000000000000 0.000000000000 0.000000000000 29.99607617032

the Charge density file obtained from scf calculation also have been provided to this calculation. after executing the software, the running_nscf.log shows: ETUP SEARCHING RADIUS FOR PROGRAM TO SEARCH ADJACENT ATOMS longest orb rcut (Bohr) = 8 longest nonlocal projector rcut (Bohr) = 3.2 ==> atom_arrange::search 224 GB 7.14 s searching radius is (Bohr)) = 22.4 searching radius unit is (Bohr)) = 1.89 ==> Atom_input::Atom_input 224 GB 7.14 s ==> Atom_input::Expand_Grid 224 GB 7.14 s ==> Atom_input::calculate_cells 224 GB 7.14 s ==> SLTK_Grid::init 224 GB 7.14 s ==> SLTK_Grid::setMemberVariables 224 GB 7.14 s ==> SLTK_Grid::Build_Cell 224 GB 7.14 s ==> SLTK_Grid::Build_Hash_Table 224 GB 7.14 s ==> SLTK_Grid::Fold_Hash_Table 224 GB 7.14 s ==> Grid_Technique::init 224 GB 7.32 s

SETUP EXTENDED REAL SPACE GRID FOR GRID INTEGRATION real space grid = [ 400, 360, 375 ] big cell numbers in grid = [ 80, 72, 125 ] meshcell numbers in big cell = [ 5, 5, 3 ] ==> Grid_MeshCell::init_latvec 224 GB 7.32 s ==> Grid_BigCell::init_big_latvec 224 GB 7.32 s ==> Grid_BigCell::init_grid_expansion 224 GB 7.32 s extended fft grid = [ 11, 11, 18 ] dimension of extened grid = [ 103, 95, 162 ] ==> Grid_MeshK::cal_extended_cell 224 GB 7.32 s UnitCellTotal = 27 ==> Grid_BigCell::init_tau_in_bigcell 224 GB 7.32 s ==> Grid_MeshBall::delete_meshball_positions 224 GB 7.32 s ==> Grid_MeshBall::init_meshball 224 GB 7.32 s ==> Grid_Technique::init_atoms_on_grid 224 GB 7.41 s ==> Grid_Technique::get_startind 224 GB 7.41 s

Warning_Memory_Consuming allocated: GT::index2normal 6.05 MB ==> Grid_BigCell::grid_expansion_index 224 GB 7.41 s No atoms on this sub-FFT-mesh. ==> Grid_Techinique::init_atoms_on_grid2 224 GB 7.55 s ==> Grid_Technique::cal_trace_lo 224 GB 7.55 s Atom number in sub-FFT-grid = 0 Local orbitals number in sub-FFT-grid = 0 ==> Record_adj::for_2d 224 GB 7.55 s ParaV.nnr = 18004308 ==> LCAO_nnr::cal_nnrg 224 GB 7.75 s ==> LCAO_nnr::cal_max_box_index 224 GB 7.75 s nnrg = 0 ==> LCAO_domain::grid_prepare 224 GB 7.75 s ==> Gint_k::prep_grid 224 GB 7.75 s ==> Potential::pot_register 224 GB 7.75 s ==> Potential::get_pot_type 224 GB 7.75 s ==> Potential::get_pot_type 224 GB 7.75 s ==> Potential::get_pot_type 224 GB 7.75 s ==> Veff::initialize_HR 224 GB 7.75 s ==> Gint::initialize_pvpR 224 GB 7.78 s ==> Gint_k::destroy_pvpR 224 GB 7.79 s ==> Gint_k::allocate_pvpR 224 GB 7.79 s ==> OverlapNew::initialize_SR 224 GB 7.79 s ==> EkineticNew::initialize_HR 223 GB 7.82 s ==> NonlocalNew::initialize_HR 223 GB 7.86 s

Warning_Memory_Consuming allocated: HamiltLCAO::hR 279 MB

Warning_Memory_Consuming allocated: HamiltLCAO::sR 161 MB ==> Local_Orbital_Charge::allocate_dm_wfc 223 GB 7.93 s ==> Local_Orbital_wfc::allocate_k 223 GB 7.93 s ==> Local_Orbital_Charge::allocate_k 223 GB 7.93 s nnrg_last = 0 nnrg_now = 0 ==> Charge::set_rho_core 223 GB 7.93 s init_chg = file try to read charge from file : ==> ModuleIO::read_rhog 223 GB 7.93 s ==> ModuleIO::read_cube 223 GB 7.93 s Find the file, try to read charge from file. read in fermi energy = 0.41

According the output information, one of my friend told me this was not due to out of memory and looked like the error occuring in the process of reading charge density. If someone could give me some advice, I would greatly appreciate the helps.

Best, Tao

Expected behavior

No response

To Reproduce

STRU.txt if you want reproduce it, for the scf part, the input file like this


INPUT_PARAMETERS suffix WTe2 ntype 2 nelec 0.0 lspinorb 1 pseudo_dir /share/home/zhangtao/work/WTe2/abacus/pseudo orbital_dir /share/home/zhangtao/work/WTe2/abacus/orbital

Prameters(general)

ecutwfc 100 #unit Ryberg 13.606 eV scf_thr 1e-6 #unit Ryberg 13.606 eV basis_type lcao symmetry 1

gamma_only 1

Parameters (Accuracy)

calculation scf

force_thr_ev 0.01

parameters(vdw)

vdw_method d2

Parameters (smearing)

smearing_method gauss

smearing_sigma 0.01

Parameters (File)

out_chg 1

kpoint file: K_POINTS 0 //total number of k-point, `0' means generate automatically Gamma //Gamma or MP 1 1 1 0 0 0 //first three number: subdivisions along reciprocal vectors //last three number: shift of the mesh

the STRU file has been attached here.

Environment

I used the version of abacus 3.6.1 calculating the nscf part and 3.4.1 version calculating the scf part. Both softwares are compilered by intel-oneapi2021 and they work well in other smal sytems.

Additional Context

No response

Task list for Issue attackers (only for developers)