Closed hongriTianqi closed 10 months ago
one can use the following lines to print out H and S matrices for use in the test of diago_lcao_test.cpp
template <typename T>
void HSolverLCAO<T>::hamiltSolvePsiK(hamilt::Hamilt<T>* hm, psi::Psi<T>& psi, double* eigenvalue)
{
ModuleBase::TITLE("HSolverLCAO", "hamiltSolvePsiK");
ModuleBase::timer::tick("HSolverLCAO", "hamiltSolvePsiK");
hamilt::MatrixBlock<T> h_mat, s_mat;
hm->matrix(h_mat, s_mat);
std::string file_name = "check_hs";
ModuleIO::saving_HS(0, h_mat.p, s_mat.p, false, 1, file_name, *(this->ParaV), true);
this->pdiagh->diag(hm, psi, eigenvalue);
ModuleBase::timer::tick("HSolverLCAO", "hamiltSolvePsiK");
}
Finally, this is not actually an issue of the elpa solver. The problem is that the H_lambda is not hermitian, resulting in error in elpa diagonalization. The scalapack uses the upper triangle matrix, and avoided the same problem as met in the Fe dimer AFM state calculation.
Describe the bug
From 2023-11-29 to 2023-12-12, the variation of the total energy with magnetic moments was observed to be inconsistent between colliear and noncollinear calculations for the AFM state of BCC Fe, where the magnetic moments were constrained by the newly implemented DeltaSpin method (PR #3050, PR #3220).
I took many pains to debug. From the beginning, this phenomenon was found in BCC Fe, as reported in ISSUE #3272. But this system is not simple enough for debugging. Then, I designed the Fe dimer system to confirm this bug. During debugging, the total energy comparison between QE and ABACUS were done. The initialization of rho in noncollinear calculation was corrected to make the scf initialization consistent between noncollinear and collinear calculations; the reading format of magnetic moments in the noncollinear case was corrected (PR #3308).
After correcting the above two bugs, the AFM energy of Fe dimer was confirmed again to be problematic, that is, inconsistent again between noncollinear and collinear calculation. Under the suggestion of @dyzheng , the mixing parameters were all set to be zero, and mixing method to plain, and then compared the change of energy again. The inconsistency between RMS during lambda loop of DeltaSpin calculation was observed from the second step already. Then the H and S matrices were compared element by element before being fed into the elpa solver in the file
module_hsolver/diago_elpa.cpp
. Comparing the results between collinear and non-collinear, the H and S are exactly the same, but the eigen values are different by 1e-4 eV. While the FM state results were consistent, with eigenvalue difference under the order of 1e-8 eV.Finally, we tried to set the ks_solver to scalapack_gvx, and got consistent energy results between collinear and noncollinear results in Fe dimer. This debug process was so long for us to realize that the
genelpa
solver may result in unphysical results.Therefore, we need help on improving the
genelpa
solver for modeling the AFM states of materials in non-collinear calculations. Apart from this one, some other issues (#3292 and #3259, #3351) are known to be related to the performance of thegenelpa
solver.Expected behavior
genelpa
should give consistent results between collinear and non-collinear calculations.To Reproduce
bug_20231212.tar.gz
After unzip the attached file, H matrix can be found here: bug_20231212/v3.10-spin2-nomixing/compare_matrix/matrix4 with size 108x108 S matrix can be found here: bug_20231212/v3.10-spin2-nomixing/compare_matrix/smat4 with size 108x108
The correct eigen values from collinear calculation are in
bug_20231212/v3.10-spin2-nomixing/compare_ekb/eigen_afm
from line 508The wrong eigen values from non-collinear calculation are in
/data/work/debug/bug_20231212/v3.10-spin4-nomixing/compare_ekb/eigen_afm
from line 500:Environment
No response
Additional Context
The matrix and eigen values were printed in the following way in the file
module_hsolver/diago_elpa.cpp
:Task list for Issue attackers (only for developers)