deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

HF energy soared for new SIAB basis set when SCF #4778

Open AroundPeking opened 4 months ago

AroundPeking commented 4 months ago

Describe the bug

For bulk NaCl and C, it was found that severe distortion of Hartree-Fock bandstructure due to unreasonable increase of energy while SCF. Here, system NaCl was investigated ranging between various rcut and SIAB version, shown in NaCl/README.md. It seems new basis sets generated by abacus_orbital_generation break down compared with v2.0 basis.

Expected behavior

No response

To Reproduce

Tests are attached here. basis_issue.tar.gz

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

kirk0830 commented 4 months ago

I find there is a quite abnormal energy rise in Hartree-Fock run: image While the PBE run seems still normal. This may indicate there are numerical problems in implementation of functions calculating Fock matrix or do diagonalization, I would suggest check the condition number of matriices to be diagonalized. cc @jinzx10

kirk0830 commented 4 months ago

image Seems the SCF behavior strongly impacted by condition number of H and S matrix, when the H's condnum is large, the SCF explodes. A precondition seems urgently required.

kirk0830 commented 4 months ago

Hi, @jinzx10 and I have found where the problem come from. First we examined the condition number of both H and S matrices based on orbital files you provided, for each SCF step:

(32, 26, 26) (32, 26, 26)
The smallest 3 eigenvalues of the first S matrix: [0.00018576 0.00018576 0.00018576]
SCF   1 H condnum:    5153.17, S condnum:  193590.17
SCF   2 H condnum:    4536.78, S condnum:  193590.17
SCF   3 H condnum:    4250.41, S condnum:  193590.17
SCF   4 H condnum:    4241.91, S condnum:  193590.17
SCF   5 H condnum:    4241.70, S condnum:  193590.17
SCF   6 H condnum:    4241.58, S condnum:  193590.17
SCF   7 H condnum:    4241.58, S condnum:  193590.17
SCF   8 H condnum:   24350.70, S condnum:  193590.17
SCF   9 H condnum:   26711.16, S condnum:  193590.17
SCF  10 H condnum:   11198.01, S condnum:  193590.17
SCF  11 H condnum:   14712.32, S condnum:  193590.17
SCF  12 H condnum:   32564.12, S condnum:  193590.17
SCF  13 H condnum:     711.29, S condnum:  193590.17
SCF  14 H condnum:    3843.25, S condnum:  193590.17
SCF  15 H condnum:    3843.00, S condnum:  193590.17
SCF  16 H condnum:    3825.50, S condnum:  193590.17
SCF  17 H condnum:    3772.00, S condnum:  193590.17
SCF  18 H condnum:    3696.36, S condnum:  193590.17
SCF  19 H condnum:    3761.33, S condnum:  193590.17
SCF  20 H condnum:    3766.69, S condnum:  193590.17
SCF  21 H condnum:    3809.10, S condnum:  193590.17
SCF  22 H condnum:   17012.41, S condnum:  193590.17
SCF  23 H condnum:   25208.92, S condnum:  193590.17
SCF  24 H condnum:   24958.63, S condnum:  193590.17
SCF  25 H condnum:   23039.97, S condnum:  193590.17
SCF  26 H condnum:   24255.10, S condnum:  193590.17
SCF  27 H condnum:   23822.54, S condnum:  193590.17
SCF  28 H condnum:   23727.83, S condnum:  193590.17
SCF  29 H condnum:   23721.05, S condnum:  193590.17
SCF  30 H condnum:   23719.24, S condnum:  193590.17
SCF  31 H condnum:   23719.24, S condnum:  193590.17
SCF  32 H condnum:   23719.24, S condnum:  193590.17

We found there was a significant increase in H condnum and after that the SCF deviated. Then we tried with the v2.0 version orbital, with was generated fully based on occupied states:

(72, 26, 26) (72, 26, 26)
The smallest 3 eigenvalues of the first S matrix: [0.00315766 0.01506759 0.01506759]
SCF   1 H condnum:     170.13, S condnum:    5900.63
SCF   2 H condnum:     139.79, S condnum:    5900.63
SCF   3 H condnum:     128.68, S condnum:    5900.63
SCF   4 H condnum:     127.47, S condnum:    5900.63
SCF   5 H condnum:     127.38, S condnum:    5900.63
SCF   6 H condnum:     127.38, S condnum:    5900.63
SCF   7 H condnum:     127.38, S condnum:    5900.63
SCF   8 H condnum:     922.63, S condnum:    5900.63
SCF   9 H condnum:     338.39, S condnum:    5900.63
SCF  10 H condnum:     355.98, S condnum:    5900.63
SCF  11 H condnum:     360.56, S condnum:    5900.63
SCF  12 H condnum:     357.58, S condnum:    5900.63
SCF  13 H condnum:     358.02, S condnum:    5900.63
SCF  14 H condnum:     358.47, S condnum:    5900.63
SCF  15 H condnum:     358.53, S condnum:    5900.63
SCF  16 H condnum:     358.51, S condnum:    5900.63
SCF  17 H condnum:     358.51, S condnum:    5900.63
SCF  18 H condnum:     358.51, S condnum:    5900.63
SCF  19 H condnum:     358.51, S condnum:    5900.63
SCF  20 H condnum:     358.51, S condnum:    5900.63
SCF  21 H condnum:     358.51, S condnum:    5900.63
SCF  22 H condnum:     358.51, S condnum:    5900.63
SCF  23 H condnum:     358.51, S condnum:    5900.63
SCF  24 H condnum:     358.51, S condnum:    5900.63
SCF  25 H condnum:     358.51, S condnum:    5900.63
SCF  26 H condnum:     358.51, S condnum:    5900.63
SCF  27 H condnum:     358.51, S condnum:    5900.63
SCF  28 H condnum:     358.51, S condnum:    5900.63
SCF  29 H condnum:     358.51, S condnum:    5900.63
SCF  30 H condnum:     358.51, S condnum:    5900.63
SCF  31 H condnum:     358.51, S condnum:    5900.63
SCF  32 H condnum:     358.51, S condnum:    5900.63
SCF  33 H condnum:     358.51, S condnum:    5900.63
SCF  34 H condnum:     358.51, S condnum:    5900.63
SCF  35 H condnum:     358.41, S condnum:    5900.63
SCF  36 H condnum:     376.69, S condnum:    5900.63
SCF  37 H condnum:     334.88, S condnum:    5900.63
SCF  38 H condnum:     312.52, S condnum:    5900.63
SCF  39 H condnum:     302.41, S condnum:    5900.63
SCF  40 H condnum:     290.29, S condnum:    5900.63
SCF  41 H condnum:     292.75, S condnum:    5900.63
SCF  42 H condnum:     293.28, S condnum:    5900.63
SCF  43 H condnum:     293.37, S condnum:    5900.63
SCF  44 H condnum:     290.93, S condnum:    5900.63
SCF  45 H condnum:     286.91, S condnum:    5900.63
SCF  46 H condnum:     282.39, S condnum:    5900.63
SCF  47 H condnum:     276.23, S condnum:    5900.63
SCF  48 H condnum:     269.90, S condnum:    5900.63
SCF  49 H condnum:     266.23, S condnum:    5900.63
SCF  50 H condnum:     260.63, S condnum:    5900.63
SCF  51 H condnum:     255.43, S condnum:    5900.63
SCF  52 H condnum:     252.71, S condnum:    5900.63
SCF  53 H condnum:     252.11, S condnum:    5900.63
SCF  54 H condnum:     252.01, S condnum:    5900.63
SCF  55 H condnum:     251.97, S condnum:    5900.63
SCF  56 H condnum:     251.96, S condnum:    5900.63
SCF  57 H condnum:     251.96, S condnum:    5900.63
SCF  58 H condnum:     251.96, S condnum:    5900.63
SCF  59 H condnum:     251.96, S condnum:    5900.63
SCF  60 H condnum:     251.97, S condnum:    5900.63
SCF  61 H condnum:     251.97, S condnum:    5900.63
SCF  62 H condnum:     251.97, S condnum:    5900.63
SCF  63 H condnum:     251.97, S condnum:    5900.63
SCF  64 H condnum:     251.97, S condnum:    5900.63
SCF  65 H condnum:     251.97, S condnum:    5900.63
SCF  66 H condnum:     251.97, S condnum:    5900.63
SCF  67 H condnum:     251.97, S condnum:    5900.63
SCF  68 H condnum:     251.97, S condnum:    5900.63
SCF  69 H condnum:     251.97, S condnum:    5900.63
SCF  70 H condnum:     251.97, S condnum:    5900.63
SCF  71 H condnum:     251.97, S condnum:    5900.63
SCF  72 H condnum:     251.97, S condnum:    5900.63

The SCF iteration was hard to get converged, but the condnum seemed much more stable than the former case. Next we tried the orbital generated by occupied state:

(19, 26, 26) (19, 26, 26)
The smallest 3 eigenvalues of the first S matrix: [0.014848   0.01894208 0.01894208]
SCF   1 H condnum:      37.50, S condnum:     273.91
SCF   2 H condnum:      36.73, S condnum:     273.91
SCF   3 H condnum:      36.59, S condnum:     273.91
SCF   4 H condnum:      36.59, S condnum:     273.91
SCF   5 H condnum:      36.59, S condnum:     273.91
SCF   6 H condnum:      36.59, S condnum:     273.91
SCF   7 H condnum:      36.59, S condnum:     273.91
SCF   8 H condnum:      82.95, S condnum:     273.91
SCF   9 H condnum:      37.08, S condnum:     273.91
SCF  10 H condnum:      37.14, S condnum:     273.91
SCF  11 H condnum:      37.00, S condnum:     273.91
SCF  12 H condnum:      36.93, S condnum:     273.91
SCF  13 H condnum:      36.95, S condnum:     273.91
SCF  14 H condnum:      36.94, S condnum:     273.91
SCF  15 H condnum:      36.93, S condnum:     273.91
SCF  16 H condnum:      36.93, S condnum:     273.91
SCF  17 H condnum:      36.93, S condnum:     273.91
SCF  18 H condnum:      36.93, S condnum:     273.91
SCF  19 H condnum:      36.93, S condnum:     273.91

The condnum of both H and S were much better than former two, and this run converged in 19 RI runs. We also compared the total energies given by, your orbital, v2.0 and v2.1-occupied state based orbitals, they were: -2.80704950e+02 -2.80866100e+02 -2.80806553e+02 , in eV, respectively.

Additionally we plotted shapes of orbitals: image You can find the shape of orbital provided in this issue showed limited similarity with v2.0 and v2.1, this is the reason why get unexpected performance.

jinzx10 commented 4 months ago

Just to add to what @kirk0830 mentioned above: in normal quantum chemistry softwares, basis often undergos a process called "canonical orthogonalization" before solving the eigenvalue problem, which involves an eigen-decomposition of the overlap matrix, followed by a basis transformation to the subspace free from any (near) linear dependency. Small eigenvalues of the overlap matrix suggest (near) linear dependency of basis set. They usually cause numerical instability and should be filtered out during canonical orthogonalization.

Here's an excerpt from qchem's manual on this topic: image

kirk0830 commented 4 months ago

while the threshold determining what basis should be moved out is always to be at1e-6 level, but if not implement calculation carefully enough, this level of orbital linear dependency can be capable to cause numerical instability. You can find PBE always stable but EXX is not. I would suggest an examine of numerical operations in EXX module, such as two-center integral of local-RI, and if any, other linalg ops.

AroundPeking commented 4 months ago

I have tested three orbitals generated by various reference strategies. Only the last one shows numerical stabilty. all occupied + 4 unoccupied bands C_gga_8au_100Ry_2s2p1d

SZ occupied bands, DZP occupied + 4 unoccupied bands C_gga_8au_100Ry_2s2p1d

all occupied bands C_gga_8au_100Ry_2s2p1d

kirk0830 commented 4 months ago

@AroundPeking To make sure, whether all PBE runs were successfully converged? If so, I think there is additional numerical instability in EXX implementation, could you contact with @PeizeLin and substitute some operations, such as two-center integration with new version implemented by @jinzx10 ?

AroundPeking commented 4 months ago

@kirk0830 All PBE have converged. I will relay your suggestion next week.

AroundPeking commented 3 months ago

I have attempted to find out two possible causes. It was likely numerical instability in local RI.

Tests are attached here. basisfix.tar.gz

issue.300714.out is the original issue, in which ETOT increased to 1e7. The basis 2s2p1d was generated by all occ+4 strategy.

First, I have tested improvement of smooth function, i.e. turn on bessel_nao_smooth and set smearing_sigma=0.01/0.1 when generating basis. Shown in sigma0.01.out and sigma0.1.out, ETOT increased to 1e5, mitigating this issue.

Second, I added auxiliary basis when calculating exx. 2s2p1d1f was generated by old version SIAB based on original 2s2p1d in this issue. And no numerical instability appeared any more, shown in abfs.301794.out. But I do not know what makes the calculation terminated, error in slurm-301794.out.